Introduction
Welcome to the Obfussor documentation, your comprehensive guide to LLVM-based code obfuscation techniques and the Obfussor framework.
What is Obfussor?
Obfussor is a high-performance binary obfuscation framework that leverages LLVM's compiler infrastructure to transform source code into hardened, reverse-engineering-resistant binaries. Built for scenarios where intellectual property protection is non-negotiable, Obfussor provides a modern, user-friendly interface powered by Angular and Tauri, with a robust Rust backend for LLVM integration.
What is Code Obfuscation?
Code obfuscation is a technique used to make software difficult to understand and reverse engineer while preserving its original functionality. Unlike encryption, which makes code unreadable until decrypted, obfuscation transforms code into a functionally equivalent but significantly more complex form that hinders analysis and comprehension.
Why Code Obfuscation Matters
In today's software landscape, protecting intellectual property is crucial. Developers and organizations face numerous threats:
- Reverse Engineering: Competitors or malicious actors analyzing your code to steal algorithms, business logic, or proprietary techniques
- Code Theft: Direct copying of your codebase or critical components
- License Violations: Unauthorized use or distribution of licensed software
- Security Vulnerabilities: Easier exploitation when attackers understand your code structure
- Intellectual Property Loss: Loss of competitive advantage when proprietary methods are exposed
Code obfuscation provides a critical defense layer against these threats, making your software significantly harder to analyze, understand, and exploit.
LLVM-Based Obfuscation Advantages
LLVM (Low Level Virtual Machine) provides unique advantages for code obfuscation:
Compiler-Level Transformation
Unlike binary obfuscation tools that work on compiled executables, LLVM obfuscation operates at the Intermediate Representation (IR) level during compilation. This provides:
- Better Integration: Seamless integration with the compilation process
- Platform Independence: Apply obfuscation once, compile for multiple targets
- Optimization Compatibility: Works alongside compiler optimizations
- Granular Control: Fine-grained control over which parts of code to obfuscate
Architecture-Agnostic Approach
LLVM IR serves as a universal intermediate language between source code and machine code:
- Cross-Platform Support: The same obfuscation techniques work across x86, ARM, MIPS, and other architectures
- Consistent Results: Predictable obfuscation behavior regardless of target platform
- Maintainability: Single codebase for obfuscation logic
Advanced Transformation Capabilities
LLVM's rich IR and pass infrastructure enable sophisticated obfuscation techniques:
- Control Flow Analysis: Deep understanding of program structure enables complex control flow transformations
- Data Flow Tracking: Precise data flow information allows for effective instruction substitution
- Type System: Strong type system in LLVM IR ensures transformations preserve program semantics
Project Features and Capabilities
Obfussor provides a comprehensive suite of obfuscation techniques:
Core Obfuscation Techniques
-
Control Flow Flattening
- Transforms natural program flow into opaque, non-linear execution paths
- Implements switch-based dispatch mechanisms
- Creates state machine-like control structures
-
String Encryption
- Automatic encryption of all string literals
- Runtime decryption mechanisms
- Multiple encryption algorithm support
-
Bogus Code Injection
- Insertion of dead code paths computationally indistinguishable from real logic
- Opaque predicate construction
- Code bloating with semantic preservation
-
Instruction Substitution
- Replaces simple instructions with semantically equivalent but complex alternatives
- Arithmetic transformation patterns
- Mixed boolean-arithmetic operations
-
Function Inlining/Outlining
- Strategic manipulation of function boundaries
- Call graph obfuscation
- Program structure obscuration
Advanced Features
- Configurable Intensity Levels: Fine-tune the security/performance tradeoff for your specific needs
- Selective Obfuscation: Choose which functions, modules, or code sections to obfuscate
- Comprehensive Reporting: Detailed metrics on obfuscation coverage, complexity increase, and performance impact
- Custom Pass Integration: Extend Obfussor with your own LLVM obfuscation passes
Performance Characteristics
Zero-Overhead Abstractions
Obfussor's design philosophy prioritizes minimal runtime overhead:
- Compile-Time Transformation: All obfuscation happens during compilation
- No Runtime Dependencies: No additional libraries or runtime components required
- Optimized Output: Obfuscated code can still be optimized by standard compiler optimizations
Resource Efficiency
- Memory Efficient: Optimized for constrained environments
- Fast Compilation: Parallel pass execution when possible
- Scalable: Handles large codebases efficiently
Performance Metrics
Obfussor generates comprehensive reports including:
- Control Flow Complexity: Cyclomatic complexity increase factor
- String Protection Coverage: Percentage of strings encrypted
- Code Inflation Ratio: Size increase due to obfuscation
- Bogus Code Distribution: Statistical analysis of injected code
- Entropy Analysis: Information-theoretic metrics of output randomness
Cross-Platform Support
Obfussor is built with modern technologies ensuring broad platform support:
Desktop Platforms
- Windows: Full support for Windows 10/11 (x64, ARM64)
- macOS: Support for macOS 10.15+ (Intel and Apple Silicon)
- Linux: Debian, Ubuntu, Fedora, Arch, and other major distributions
Architecture Support
Through LLVM's architecture-agnostic approach:
- x86/x86_64: Full support for Intel and AMD processors
- ARM/ARM64: Support for ARM-based systems including Apple M1/M2
- RISC-V: Experimental support for RISC-V architectures
- WebAssembly: Can obfuscate code compiled to WebAssembly
Technology Stack
- Frontend: Angular 20.x - Modern, responsive web-based UI
- Backend: Rust - Safe, fast, and reliable LLVM integration
- Desktop Framework: Tauri - Lightweight, secure desktop application framework
- Build System: Integration with standard compilation toolchains
Target Audience
Obfussor is designed for:
Software Developers
- Developers wanting to protect their intellectual property
- Teams building commercial software requiring reverse engineering protection
- Open source developers protecting sensitive algorithms
Security Professionals
- Security researchers studying obfuscation and deobfuscation techniques
- Penetration testers understanding obfuscated code analysis
- Security engineers implementing defense-in-depth strategies
Organizations
- Companies protecting proprietary software and algorithms
- Financial institutions securing trading algorithms and business logic
- Gaming companies preventing cheating and piracy
- Mobile app developers protecting against app cloning
Researchers and Academics
- Computer science researchers studying program transformation
- Students learning about compiler design and code protection
- Academic institutions teaching software security
License Information
Obfussor is released under the MIT License, which means:
- Free to Use: Use Obfussor for personal, academic, or commercial projects
- Modification Rights: Modify the source code to suit your needs
- Distribution: Distribute original or modified versions
- No Warranty: Software is provided "as is" without warranty
- Attribution: Keep the original copyright notice in distributions
For complete license details, see the LICENSE file in the repository.
Getting Started
Ready to protect your code? Here's what's next:
- Installation: Set up Obfussor on your system
- Quick Start: Obfuscate your first program
- Configuration: Learn about configuration options
- LLVM Fundamentals: Understand the underlying technology
Documentation Structure
This documentation is organized into several sections:
- Getting Started: Installation, quick start guide, and basic configuration
- LLVM Fundamentals: Understanding LLVM architecture, IR, and passes
- Obfuscation Techniques: Detailed explanation of each obfuscation method
- Implementation Details: Architecture and implementation of Obfussor
- Advanced Topics: Custom passes, optimization, and security analysis
- Use Cases: Real-world applications and scenarios
- API Reference: Complete API documentation for CLI and programmatic use
- Troubleshooting: Common issues and solutions
- Contributing: How to contribute to Obfussor development
Community and Support
- GitHub Repository: https://github.com/matrixbytes/Obfussor
- Issue Tracker: Report bugs and request features on GitHub Issues
- Discussions: Join community discussions on GitHub Discussions
- Contributing: See Contributing Guidelines to get involved
Let's begin your journey into LLVM-based code obfuscation!
Installation
This guide will walk you through installing Obfussor and all its prerequisites on your system.
Prerequisites
Before installing Obfussor, ensure you have the following tools installed:
Required Tools
1. Node.js (v18.0.0 or later)
Node.js is required for the Angular frontend.
Windows:
# Download and install from nodejs.org
# Or use Chocolatey
choco install nodejs
# Verify installation
node --version
npm --version
macOS:
# Using Homebrew
brew install node
# Verify installation
node --version
npm --version
Linux:
# Ubuntu/Debian
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs
# Fedora
sudo dnf install nodejs
# Verify installation
node --version
npm --version
2. Bun (Latest Version)
Bun is a fast JavaScript runtime and package manager used in this project.
Windows:
# Using PowerShell
powershell -c "irm bun.sh/install.ps1 | iex"
# Verify installation
bun --version
macOS/Linux:
# Using curl
curl -fsSL https://bun.sh/install | bash
# Verify installation
bun --version
3. Rust (Latest Stable)
Rust is required for the Tauri backend and LLVM integration.
All Platforms:
# Install rustup (Rust installer)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# On Windows, download and run rustup-init.exe from rustup.rs
# Follow the prompts and choose default installation
# Restart your terminal, then verify
rustc --version
cargo --version
Post-Installation:
# Update Rust to latest version
rustup update
# Add common components
rustup component add rustfmt clippy
4. Tauri CLI
Tauri CLI is required to build and run the desktop application.
# Install Tauri CLI via Cargo
cargo install tauri-cli --version "^2.0"
# Verify installation
cargo tauri --version
5. LLVM (Version 14.0 or later)
LLVM is the core dependency for obfuscation functionality.
Windows:
# Download pre-built binaries from llvm.org
# Or use Chocolatey
choco install llvm
# Add to PATH: C:\Program Files\LLVM\bin
macOS:
# Using Homebrew
brew install llvm
# Add to PATH (add to ~/.zshrc or ~/.bash_profile)
echo 'export PATH="/usr/local/opt/llvm/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
# Verify installation
llvm-config --version
Linux:
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install llvm-14 llvm-14-dev clang-14
# Fedora
sudo dnf install llvm llvm-devel clang
# Verify installation
llvm-config --version
Optional Tools
Git
Version control for cloning the repository:
# Windows (Chocolatey)
choco install git
# macOS
brew install git
# Linux (Ubuntu/Debian)
sudo apt-get install git
# Verify
git --version
Visual Studio Code
Recommended IDE with excellent Rust, Angular, and TypeScript support:
# Download from code.visualstudio.com
# Recommended Extensions:
# - rust-analyzer
# - Angular Language Service
# - Tauri
# - ESLint
# - Prettier
Platform-Specific Requirements
Windows
Visual Studio Build Tools (required for Rust compilation):
- Download Visual Studio Build Tools
- Install with "Desktop development with C++" workload
- Ensure the following components are selected:
- MSVC v143 - VS 2022 C++ x64/x86 build tools
- Windows 10/11 SDK
- C++ CMake tools for Windows
WebView2 (required for Tauri):
- Windows 11: Pre-installed
- Windows 10: Download WebView2 Runtime
macOS
Xcode Command Line Tools:
xcode-select --install
Linux
Build Dependencies:
Debian/Ubuntu:
sudo apt-get update
sudo apt-get install -y \
libwebkit2gtk-4.1-dev \
build-essential \
curl \
wget \
file \
libssl-dev \
libgtk-3-dev \
libayatana-appindicator3-dev \
librsvg2-dev
Fedora:
sudo dnf install \
webkit2gtk4.1-devel \
openssl-devel \
curl \
wget \
file \
gtk3-devel \
libappindicator-gtk3-devel \
librsvg2-devel
Arch Linux:
sudo pacman -S \
webkit2gtk \
base-devel \
curl \
wget \
file \
openssl \
gtk3 \
libappindicator-gtk3 \
librsvg
Installing Obfussor
Method 1: Clone from GitHub
# Clone the repository
git clone https://github.com/matrixbytes/Obfussor.git
# Navigate to the directory
cd Obfussor
# Install dependencies
bun install
# Verify installation
bun ng version
cargo tauri info
Method 2: Download Release Binary
- Visit GitHub Releases
- Download the latest release for your platform:
- Windows:
Obfussor-{version}-x64-setup.exe
- macOS:
Obfussor-{version}-x64.dmg
orObfussor-{version}-aarch64.dmg
- Linux:
Obfussor-{version}-amd64.AppImage
or.deb
/.rpm
- Windows:
- Install following platform-specific instructions
Post-Installation Verification
Verify all components are correctly installed:
# Check Node.js
node --version # Should be >= 18.0.0
# Check Bun
bun --version
# Check Rust
rustc --version
cargo --version
# Check Tauri
cargo tauri --version
# Check LLVM
llvm-config --version # Should be >= 14.0
# Check Clang
clang --version
Building from Source
If you cloned from GitHub, build Obfussor:
# Development build
cargo tauri dev
# Production build
cargo tauri build
The production build will create installers in:
- Windows:
src-tauri/target/release/bundle/msi/
orsrc-tauri/target/release/bundle/nsis/
- macOS:
src-tauri/target/release/bundle/dmg/
orsrc-tauri/target/release/bundle/macos/
- Linux:
src-tauri/target/release/bundle/appimage/
orsrc-tauri/target/release/bundle/deb/
Environment Configuration
Setting up LLVM Environment Variables
Windows:
# Add to System Environment Variables
setx LLVM_SYS_140_PREFIX "C:\Program Files\LLVM"
setx PATH "%PATH%;C:\Program Files\LLVM\bin"
macOS/Linux:
# Add to ~/.zshrc or ~/.bashrc
export LLVM_SYS_140_PREFIX="/usr/local/opt/llvm"
export PATH="/usr/local/opt/llvm/bin:$PATH"
# Apply changes
source ~/.zshrc # or source ~/.bashrc
Rust Environment
Ensure Rust environment is properly configured:
# Verify cargo is in PATH
which cargo
# If not found, add to PATH
echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
Troubleshooting Installation Issues
Common Problems
1. Bun Installation Fails on Windows
Error: PowerShell execution policy prevents installation
Solution:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
2. Rust/Cargo Not Found
Error: cargo: command not found
Solution:
- Restart terminal
- Manually add to PATH:
$HOME/.cargo/bin
(Unix) or%USERPROFILE%\.cargo\bin
(Windows) - Re-run Rust installer
3. LLVM Not Found
Error: could not find native static library 'LLVM'
Solution:
# Verify LLVM is installed
llvm-config --version
# Set LLVM_SYS_140_PREFIX environment variable
export LLVM_SYS_140_PREFIX=$(llvm-config --prefix)
# Try installation again
4. WebKit2GTK Missing (Linux)
Error: Package webkit2gtk-4.1 was not found
Solution:
# Ubuntu/Debian
sudo apt-get install libwebkit2gtk-4.1-dev
# Older systems may need webkit2gtk-4.0:
sudo apt-get install libwebkit2gtk-4.0-dev
5. Tauri CLI Installation Fails
Error: Compilation errors during cargo install tauri-cli
Solution:
# Update Rust
rustup update
# Install with specific version
cargo install tauri-cli --version "^2.0"
# Windows: Ensure Visual Studio Build Tools are installed
Getting Help
If you encounter issues not covered here:
- Check Troubleshooting Guide
- Review GitHub Issues
- Consult Tauri Prerequisites
- Open a new issue with detailed error messages
Next Steps
Once installation is complete:
- Quick Start Guide: Learn to obfuscate your first program
- Configuration: Understand configuration options
- LLVM Overview: Learn about LLVM fundamentals
Congratulations! You now have Obfussor installed and ready to use.
Quick Start
This guide will help you obfuscate your first program using Obfussor. By the end of this tutorial, you'll understand the basic workflow and be able to apply obfuscation to your own projects.
Prerequisites
Before you begin, ensure you have:
- Completed the Installation guide
- Basic knowledge of C/C++ programming
- A simple C/C++ program to obfuscate
- LLVM toolchain properly configured
Your First Obfuscation
Step 1: Create a Sample Program
Let's start with a simple C program:
// hello.c
#include <stdio.h>
int add(int a, int b) {
return a + b;
}
int main() {
int x = 5;
int y = 10;
int result = add(x, y);
printf("The sum of %d and %d is %d\n", x, y, result);
printf("Hello from Obfussor!\n");
return 0;
}
Save this as hello.c
in your working directory.
Step 2: Launch Obfussor
Start the Obfussor application:
# If built from source
cargo tauri dev
# Or run the installed application
./Obfussor # Linux/macOS
# Or launch from Applications menu/Start menu
The Obfussor GUI will open, presenting you with the main interface.
Step 3: Configure Obfuscation Settings
In the Obfussor interface:
- Select Input File: Click "Browse" and select your
hello.c
file - Choose Output Directory: Specify where the obfuscated output should be saved
- Select Obfuscation Techniques:
- ✓ Control Flow Flattening
- ✓ String Encryption
- ✓ Instruction Substitution
- Set Intensity Level: Choose "Medium" for this example
- Configure Compiler Options:
- Compiler:
clang
- Optimization Level:
-O2
- Target Architecture: Auto-detect
- Compiler:
Step 4: Run Obfuscation
Click the "Obfuscate" button to start the process.
Obfussor will:
- Parse your source code to LLVM IR
- Apply selected obfuscation passes
- Generate obfuscated LLVM IR
- Compile to native binary
- Generate a detailed report
Step 5: Review Results
After completion, you'll see:
Obfuscation Report:
Obfuscation Summary
===================
Input File: hello.c
Output File: hello_obfuscated
Techniques Applied:
- Control Flow Flattening: ✓
- String Encryption: ✓
- Instruction Substitution: ✓
Metrics:
- Functions Obfuscated: 2/2
- Strings Encrypted: 2/2
- Instructions Substituted: 15
- Code Size Increase: 45%
- Cyclomatic Complexity Increase: 3.2x
Status: Success ✓
Step 6: Test the Obfuscated Program
Run the obfuscated binary to verify it works correctly:
# Navigate to output directory
cd output/
# Run the obfuscated program
./hello_obfuscated
# Expected output:
# The sum of 5 and 10 is 15
# Hello from Obfussor!
The program should function identically to the original!
Step 7: Compare Original and Obfuscated
Original LLVM IR (simplified):
define i32 @add(i32 %a, i32 %b) {
entry:
%sum = add i32 %a, %b
ret i32 %sum
}
define i32 @main() {
entry:
%x = alloca i32
%y = alloca i32
store i32 5, i32* %x
store i32 10, i32* %y
%call = call i32 @add(i32 5, i32 10)
; ... printf calls ...
ret i32 0
}
Obfuscated LLVM IR (simplified):
define i32 @add(i32 %a, i32 %b) {
entry:
%switch.var = alloca i32
store i32 0, i32* %switch.var
br label %switch.dispatch
switch.dispatch:
%state = load i32, i32* %switch.var
switch i32 %state, label %unreachable [
i32 0, label %block.0
i32 1, label %block.1
i32 2, label %block.2
]
block.0:
; Bogus code
%bogus1 = add i32 %a, 42
store i32 1, i32* %switch.var
br label %switch.dispatch
block.1:
; Actual computation (obfuscated)
%t1 = sub i32 0, %b
%t2 = sub i32 %a, %t1
store i32 2, i32* %switch.var
br label %switch.dispatch
block.2:
ret i32 %t2
unreachable:
unreachable
}
Notice how the control flow is flattened and the simple addition is replaced with complex instructions.
Command Line Interface (CLI)
For automation and scripting, use the CLI:
Basic Usage
obfussor-cli obfuscate \
--input hello.c \
--output hello_obfuscated \
--techniques cff,str,sub \
--intensity medium
CLI Options
obfussor-cli obfuscate [OPTIONS]
OPTIONS:
-i, --input <FILE> Input source file
-o, --output <FILE> Output file name
-t, --techniques <LIST> Comma-separated list of techniques
(cff, str, bog, sub, inl)
--intensity <LEVEL> Obfuscation intensity (low, medium, high)
--compiler <COMPILER> Compiler to use (clang, gcc)
-O <LEVEL> Optimization level (0, 1, 2, 3, s)
--target <ARCH> Target architecture
--config <FILE> Configuration file
--report <FILE> Output report file
--ir-only Generate LLVM IR only (no compilation)
-v, --verbose Verbose output
-h, --help Show help message
Example: Maximum Obfuscation
obfussor-cli obfuscate \
--input myprogram.c \
--output myprogram_protected \
--techniques cff,str,bog,sub,inl \
--intensity high \
-O2 \
--report obfuscation-report.json \
--verbose
Example: Configuration File
Create obfuscation-config.json
:
{
"techniques": {
"control_flow_flattening": {
"enabled": true,
"intensity": "medium",
"preserve_functions": ["main"]
},
"string_encryption": {
"enabled": true,
"algorithm": "aes128",
"exclude_patterns": ["debug_*"]
},
"instruction_substitution": {
"enabled": true,
"complexity": 3
}
},
"compiler": {
"name": "clang",
"optimization": "O2",
"flags": ["-fno-inline"]
},
"output": {
"ir_file": "output.ll",
"report_file": "report.json",
"preserve_symbols": false
}
}
Use the configuration:
obfussor-cli obfuscate \
--input myprogram.c \
--config obfuscation-config.json
Working with Projects
Single File Projects
obfussor-cli obfuscate \
--input main.c \
--output main_obf \
--techniques cff,str
Multiple Files
Obfuscate each file separately and link:
# Obfuscate each source file to LLVM IR
obfussor-cli obfuscate --input file1.c --output file1_obf.ll --ir-only
obfussor-cli obfuscate --input file2.c --output file2_obf.ll --ir-only
# Compile IR files to object files
clang -c file1_obf.ll -o file1_obf.o
clang -c file2_obf.ll -o file2_obf.o
# Link obfuscated object files
clang file1_obf.o file2_obf.o -o program_obfuscated
Integration with Build Systems
Makefile Example
CC = clang
OBFUSSOR = obfussor-cli
SOURCES = main.c utils.c
OBJECTS = $(SOURCES:.c=.o)
OBFUSCATED = $(SOURCES:.c=_obf.o)
all: program_obfuscated
%.o: %.c
$(CC) -c $< -o $@
%_obf.o: %.c
$(OBFUSSOR) obfuscate --input $< --output $@ --techniques cff,str
program_obfuscated: $(OBFUSCATED)
$(CC) $(OBFUSCATED) -o $@
clean:
rm -f $(OBJECTS) $(OBFUSCATED) program_obfuscated
CMake Example
# Add custom command for obfuscation
function(add_obfuscated_executable target)
set(SOURCES ${ARGN})
set(OBFUSCATED_SOURCES "")
foreach(source ${SOURCES})
get_filename_component(source_name ${source} NAME_WE)
set(obf_source "${CMAKE_BINARY_DIR}/${source_name}_obf.c")
add_custom_command(
OUTPUT ${obf_source}
COMMAND obfussor-cli obfuscate
--input ${CMAKE_CURRENT_SOURCE_DIR}/${source}
--output ${obf_source}
--techniques cff,str
DEPENDS ${source}
COMMENT "Obfuscating ${source}"
)
list(APPEND OBFUSCATED_SOURCES ${obf_source})
endforeach()
add_executable(${target} ${OBFUSCATED_SOURCES})
endfunction()
# Usage
add_obfuscated_executable(my_program main.c utils.c)
Verifying Obfuscation
Visual Inspection
Compare the disassembly of original and obfuscated binaries:
# Disassemble original
objdump -d hello > hello_original.asm
# Disassemble obfuscated
objdump -d hello_obfuscated > hello_obfuscated.asm
# Compare
diff hello_original.asm hello_obfuscated.asm
Using Analysis Tools
Analyze with tools like Ghidra or IDA Pro:
- Load the original binary
- Note the control flow graph structure
- Load the obfuscated binary
- Compare the complexity and readability
Automated Testing
Ensure functionality is preserved:
# Create test script
cat > test.sh << 'EOF'
#!/bin/bash
# Test original
./hello > original_output.txt
# Test obfuscated
./hello_obfuscated > obfuscated_output.txt
# Compare outputs
if diff original_output.txt obfuscated_output.txt; then
echo "✓ Functionality preserved"
else
echo "✗ Output differs - obfuscation error!"
exit 1
fi
EOF
chmod +x test.sh
./test.sh
Understanding the Report
Obfussor generates detailed JSON reports:
{
"timestamp": "2024-01-15T10:30:00Z",
"input_file": "hello.c",
"output_file": "hello_obfuscated",
"techniques": [
{
"name": "control_flow_flattening",
"status": "applied",
"functions_affected": 2,
"metrics": {
"blocks_added": 15,
"complexity_increase": 3.2
}
},
{
"name": "string_encryption",
"status": "applied",
"strings_encrypted": 2,
"encryption_algorithm": "xor"
}
],
"overall_metrics": {
"original_size": 8432,
"obfuscated_size": 12227,
"size_increase_percent": 45,
"original_complexity": 5,
"obfuscated_complexity": 16
}
}
Next Steps
Now that you've obfuscated your first program:
- Configuration Guide: Learn about advanced configuration options
- Obfuscation Techniques: Understand each technique in detail
- LLVM Fundamentals: Learn how LLVM powers obfuscation
- Advanced Topics: Create custom obfuscation passes
Common Pitfalls
1. Over-Obfuscation
Problem: Applying all techniques at maximum intensity Solution: Start with medium intensity and specific techniques based on threat model
2. Breaking Debug Symbols
Problem: Obfuscation removes debug information
Solution: Keep separate debug builds; use --preserve-symbols
for development
3. Performance Degradation
Problem: High intensity obfuscation significantly slows execution Solution: Profile your application; selectively obfuscate critical functions only
4. Compilation Errors
Problem: Obfuscated IR fails to compile Solution: Check LLVM version compatibility; verify input code compiles without obfuscation first
Tips for Success
- Start Simple: Begin with one technique, verify it works, then add more
- Test Thoroughly: Always test obfuscated binaries match original behavior
- Version Control: Keep original source separate from obfuscated versions
- Document Configuration: Save your obfuscation configs for reproducibility
- Benchmark Performance: Measure performance impact before deploying
Congratulations! You've successfully obfuscated your first program with Obfussor.
Configuration
Obfussor provides flexible configuration options to customize obfuscation behavior for your specific needs. This guide covers all configuration methods and available options.
Configuration Methods
Obfussor supports three configuration methods:
- GUI Configuration: Interactive configuration through the desktop application
- Configuration Files: JSON-based configuration files for reproducible builds
- Command-Line Arguments: Direct configuration via CLI flags
Priority Order
When multiple configuration methods are used:
CLI Arguments > Configuration File > GUI Settings > Default Values
Configuration File Format
Basic Structure
Create a JSON configuration file (e.g., obfussor.json
):
{
"version": "1.0",
"input": {
"files": ["src/main.c", "src/utils.c"],
"include_dirs": ["include/"],
"defines": ["RELEASE_BUILD"]
},
"output": {
"directory": "build/obfuscated",
"basename": "program",
"generate_ir": true,
"generate_report": true,
"report_format": "json"
},
"techniques": {
"control_flow_flattening": {
"enabled": true,
"intensity": "medium",
"options": {}
},
"string_encryption": {
"enabled": true,
"algorithm": "aes128",
"options": {}
},
"bogus_control_flow": {
"enabled": false
},
"instruction_substitution": {
"enabled": true,
"complexity": 3
},
"function_inlining": {
"enabled": false
}
},
"compiler": {
"name": "clang",
"optimization_level": "O2",
"target_architecture": "x86_64",
"additional_flags": ["-fno-inline", "-fno-unroll-loops"]
},
"advanced": {
"preserve_symbols": false,
"strip_debug_info": true,
"seed": null
}
}
Using Configuration Files
# CLI
obfussor-cli obfuscate --config obfussor.json
# Or specify additional overrides
obfussor-cli obfuscate --config obfussor.json --intensity high
Configuration Sections
Input Configuration
Controls what source files to obfuscate and how to process them.
{
"input": {
"files": [
"src/main.c",
"src/module1.c",
"src/module2.c"
],
"include_dirs": [
"include/",
"third_party/include/"
],
"defines": [
"RELEASE_BUILD",
"ENABLE_OBFUSCATION",
"VERSION=1.0"
],
"exclude_patterns": [
"*_test.c",
"debug_*.c"
]
}
}
Options:
files
: Array of source files to obfuscateinclude_dirs
: Include directories for compilationdefines
: Preprocessor definitionsexclude_patterns
: Glob patterns for files to exclude
Output Configuration
Controls output generation and reporting.
{
"output": {
"directory": "build/obfuscated",
"basename": "myapp",
"generate_ir": true,
"generate_report": true,
"report_format": "json",
"report_file": "obfuscation-report.json",
"ir_directory": "build/ir/",
"preserve_structure": false
}
}
Options:
directory
: Output directory for obfuscated filesbasename
: Base name for output filesgenerate_ir
: Generate intermediate LLVM IR filesgenerate_report
: Create obfuscation reportreport_format
: Report format (json
,html
,text
)report_file
: Custom report file nameir_directory
: Directory for IR filespreserve_structure
: Maintain input directory structure
Technique Configuration
Each obfuscation technique can be configured individually.
Control Flow Flattening
{
"control_flow_flattening": {
"enabled": true,
"intensity": "medium",
"options": {
"split_basic_blocks": true,
"dispatch_type": "switch",
"state_variable_type": "i32",
"bogus_states": 5,
"preserve_functions": ["main", "init_*"],
"min_block_size": 3
}
}
}
Options:
enabled
: Enable/disable the techniqueintensity
: Obfuscation intensity (low
,medium
,high
)split_basic_blocks
: Split basic blocks before flatteningdispatch_type
: Dispatch mechanism (switch
,indirect
)state_variable_type
: LLVM type for state variablebogus_states
: Number of unreachable bogus statespreserve_functions
: Functions to exclude (glob patterns supported)min_block_size
: Minimum instructions per block to flatten
String Encryption
{
"string_encryption": {
"enabled": true,
"algorithm": "aes128",
"options": {
"key_generation": "random",
"encryption_key": null,
"decrypt_function": "inline",
"exclude_patterns": [
"debug_*",
"test_*"
],
"min_length": 4,
"encrypt_wide_strings": true
}
}
}
Options:
algorithm
: Encryption algorithm (xor
,aes128
,aes256
,custom
)key_generation
: Key generation method (random
,derived
,fixed
)encryption_key
: Fixed encryption key (hex string, null for random)decrypt_function
: Decryption function placement (inline
,function
,constructor
)exclude_patterns
: String patterns to excludemin_length
: Minimum string length to encryptencrypt_wide_strings
: Also encrypt wide character strings
Bogus Control Flow
{
"bogus_control_flow": {
"enabled": true,
"intensity": "medium",
"options": {
"injection_probability": 0.3,
"max_bogus_blocks": 5,
"opaque_predicate_complexity": 3,
"use_external_functions": false,
"preserve_semantics": true
}
}
}
Options:
injection_probability
: Probability of injecting bogus code (0.0-1.0)max_bogus_blocks
: Maximum bogus blocks per functionopaque_predicate_complexity
: Complexity of opaque predicates (1-5)use_external_functions
: Call external functions in bogus codepreserve_semantics
: Ensure bogus code doesn't affect semantics
Instruction Substitution
{
"instruction_substitution": {
"enabled": true,
"complexity": 3,
"options": {
"substitute_arithmetic": true,
"substitute_boolean": true,
"mixed_boolean_arithmetic": true,
"max_substitution_depth": 3,
"preserve_performance": false
}
}
}
Options:
complexity
: Substitution complexity level (1-5)substitute_arithmetic
: Replace arithmetic operationssubstitute_boolean
: Replace boolean operationsmixed_boolean_arithmetic
: Use MBA (Mixed Boolean-Arithmetic) expressionsmax_substitution_depth
: Maximum recursive substitution depthpreserve_performance
: Limit substitutions affecting performance
Function Inlining/Outlining
{
"function_inlining": {
"enabled": true,
"strategy": "mixed",
"options": {
"inline_threshold": 100,
"outline_threshold": 50,
"inline_functions": ["small_*"],
"outline_functions": ["compute_*"],
"preserve_abi": true
}
}
}
Options:
strategy
: Strategy (inline
,outline
,mixed
,random
)inline_threshold
: Maximum size for inlining (IR instructions)outline_threshold
: Minimum size for outlininginline_functions
: Function patterns to inlineoutline_functions
: Function patterns to outlinepreserve_abi
: Preserve ABI for external calls
Compiler Configuration
Configure the compilation process:
{
"compiler": {
"name": "clang",
"version": "14.0",
"optimization_level": "O2",
"target_architecture": "x86_64",
"target_os": "linux",
"additional_flags": [
"-fno-inline",
"-fno-unroll-loops",
"-fno-vectorize"
],
"link_flags": [
"-static",
"-s"
],
"emit_llvm": false
}
}
Options:
name
: Compiler executable (clang
,gcc
,clang++
)version
: Required compiler version (optional)optimization_level
: Optimization level (O0
,O1
,O2
,O3
,Os
,Oz
)target_architecture
: Target architecture (x86_64
,arm64
,i386
)target_os
: Target operating system (linux
,windows
,macos
)additional_flags
: Extra compiler flagslink_flags
: Linker flagsemit_llvm
: Emit LLVM bitcode instead of native binary
Advanced Configuration
Advanced options for fine-tuning:
{
"advanced": {
"preserve_symbols": false,
"strip_debug_info": true,
"seed": 12345,
"parallelism": 4,
"cache_enabled": true,
"cache_directory": ".obfussor-cache/",
"verify_output": true,
"log_level": "info",
"dry_run": false
}
}
Options:
preserve_symbols
: Keep symbol names in outputstrip_debug_info
: Remove debug informationseed
: Random seed for reproducible obfuscation (null for random)parallelism
: Number of parallel threads (0 for auto)cache_enabled
: Enable compilation cachecache_directory
: Cache directory locationverify_output
: Verify obfuscated IR validitylog_level
: Logging level (debug
,info
,warn
,error
)dry_run
: Perform dry run without generating output
Preset Configurations
Minimal Obfuscation
For development and debugging:
{
"version": "1.0",
"techniques": {
"control_flow_flattening": {
"enabled": true,
"intensity": "low"
},
"string_encryption": {
"enabled": false
}
},
"compiler": {
"optimization_level": "O0"
},
"advanced": {
"preserve_symbols": true,
"strip_debug_info": false
}
}
Balanced Obfuscation
For most production use cases:
{
"version": "1.0",
"techniques": {
"control_flow_flattening": {
"enabled": true,
"intensity": "medium"
},
"string_encryption": {
"enabled": true,
"algorithm": "aes128"
},
"instruction_substitution": {
"enabled": true,
"complexity": 3
}
},
"compiler": {
"optimization_level": "O2"
},
"advanced": {
"preserve_symbols": false,
"strip_debug_info": true
}
}
Maximum Obfuscation
For maximum protection (performance impact):
{
"version": "1.0",
"techniques": {
"control_flow_flattening": {
"enabled": true,
"intensity": "high",
"options": {
"bogus_states": 10
}
},
"string_encryption": {
"enabled": true,
"algorithm": "aes256"
},
"bogus_control_flow": {
"enabled": true,
"intensity": "high",
"options": {
"injection_probability": 0.5
}
},
"instruction_substitution": {
"enabled": true,
"complexity": 5
},
"function_inlining": {
"enabled": true,
"strategy": "mixed"
}
},
"compiler": {
"optimization_level": "O3"
},
"advanced": {
"preserve_symbols": false,
"strip_debug_info": true
}
}
Configuration Validation
Validate your configuration file:
obfussor-cli validate-config obfussor.json
Output:
✓ Configuration file is valid
✓ All techniques are properly configured
✓ Compiler settings are compatible
⚠ Warning: High intensity may significantly impact performance
Environment Variables
Override configuration with environment variables:
# Set default obfuscation intensity
export OBFUSSOR_INTENSITY=high
# Set compiler
export OBFUSSOR_COMPILER=clang-14
# Set parallelism
export OBFUSSOR_PARALLELISM=8
# Use configuration
obfussor-cli obfuscate --input main.c
GUI Configuration
Interactive Configuration
- Launch Obfussor application
- Navigate to Settings tab
- Configure techniques:
- Toggle each technique on/off
- Adjust intensity sliders
- Configure technique-specific options
- Save configuration:
- Click Save Configuration
- Choose location for config file
- Load configuration:
- Click Load Configuration
- Select saved config file
Configuration Profiles
The GUI supports multiple named profiles:
- Create Profile: Settings → New Profile
- Switch Profile: Select from dropdown
- Export Profile: Settings → Export → JSON/YAML
- Import Profile: Settings → Import
Best Practices
1. Version Control Configuration
Store configuration files in version control:
project/
├── src/
├── obfussor-dev.json # Development config
├── obfussor-release.json # Release config
└── obfussor-max.json # Maximum protection config
2. Incremental Configuration
Start minimal and add techniques incrementally:
# Start with basic
obfussor-cli obfuscate --config obfussor-basic.json --input main.c
# Test, then increase
obfussor-cli obfuscate --config obfussor-medium.json --input main.c
# Finally, apply maximum if needed
obfussor-cli obfuscate --config obfussor-max.json --input main.c
3. Performance Testing
Always measure performance impact:
# Benchmark original
time ./program_original
# Benchmark obfuscated
time ./program_obfuscated
# Compare and adjust configuration
4. Selective Obfuscation
Obfuscate only critical code:
{
"techniques": {
"control_flow_flattening": {
"enabled": true,
"options": {
"preserve_functions": [
"*",
"!critical_*",
"!secret_*"
]
}
}
}
}
Pattern !
means "do NOT preserve" (i.e., do obfuscate).
5. Reproducible Builds
Use fixed seeds for reproducible obfuscation:
{
"advanced": {
"seed": 42
}
}
Configuration Examples
Example 1: Mobile Application
{
"techniques": {
"control_flow_flattening": {
"enabled": true,
"intensity": "medium"
},
"string_encryption": {
"enabled": true,
"algorithm": "xor"
},
"instruction_substitution": {
"enabled": true,
"complexity": 2
}
},
"compiler": {
"optimization_level": "Os",
"target_architecture": "arm64"
}
}
Example 2: Server Application
{
"techniques": {
"control_flow_flattening": {
"enabled": true,
"intensity": "high"
},
"string_encryption": {
"enabled": true,
"algorithm": "aes256"
},
"bogus_control_flow": {
"enabled": true
}
},
"compiler": {
"optimization_level": "O3"
},
"advanced": {
"parallelism": 16
}
}
Example 3: Embedded System
{
"techniques": {
"control_flow_flattening": {
"enabled": true,
"intensity": "low"
},
"string_encryption": {
"enabled": true,
"algorithm": "xor"
}
},
"compiler": {
"optimization_level": "Os",
"target_architecture": "arm",
"additional_flags": ["-mthumb"]
},
"advanced": {
"verify_output": true
}
}
Troubleshooting Configuration
Configuration Not Applied
Problem: Configuration seems ignored
Solution:
# Verify configuration is loaded
obfussor-cli obfuscate --config config.json --verbose
# Check for CLI argument overrides
# Ensure no conflicting environment variables
Invalid Configuration
Problem: Configuration validation fails
Solution:
# Validate JSON syntax
cat config.json | jq .
# Use schema validation
obfussor-cli validate-config config.json --schema
Unexpected Results
Problem: Obfuscation doesn't match expectations
Solution:
# Enable detailed logging
obfussor-cli obfuscate --config config.json --log-level debug
# Generate detailed report
obfussor-cli obfuscate --config config.json --report-format html
Next Steps
- Obfuscation Techniques: Learn about each technique
- CLI Reference: Complete CLI documentation
- Advanced Topics: Optimize your configuration
- Troubleshooting: Solve common problems
With proper configuration, you can balance security, performance, and maintainability for your specific use case.
LLVM Overview
LLVM (Low Level Virtual Machine) is a powerful compiler infrastructure that provides a modern, modular approach to compiler design. Understanding LLVM is essential for grasping how Obfussor performs code obfuscation at the compiler level.
What is LLVM?
LLVM is not just a compiler, but a comprehensive collection of modular and reusable compiler and toolchain technologies. Despite its name containing "Virtual Machine," LLVM is not a traditional virtual machine - it's a compiler infrastructure designed around a language-independent intermediate representation (IR).
Key Characteristics
- Modular Design: LLVM's architecture separates concerns into distinct, reusable components
- Language Independence: Frontend-agnostic approach supports multiple source languages
- Target Independence: Backend supports multiple target architectures
- Optimization Framework: Sophisticated optimization infrastructure built on SSA form
- Active Development: Continuously evolving with strong industry and academic support
LLVM Architecture
LLVM follows a three-phase design that separates compilation into distinct stages:
Source Code → Frontend → LLVM IR → Optimizer → LLVM IR → Backend → Machine Code
Three-Phase Architecture
1. Frontend
The frontend translates source code into LLVM IR:
- Lexical Analysis: Tokenization of source code
- Syntax Analysis: Parse tree construction
- Semantic Analysis: Type checking and validation
- IR Generation: Translation to LLVM IR
Popular frontends include:
- Clang: C, C++, Objective-C
- Swift: Swift language
- Rust: Rust language (via rustc)
- Julia: Julia language
2. Optimizer (Middle-End)
The optimizer transforms LLVM IR to improve performance:
- Analysis Passes: Gather information about the code
- Transformation Passes: Modify the IR to optimize it
- Utility Passes: Provide helper functionality
Key optimizations:
- Dead code elimination
- Constant folding and propagation
- Loop optimizations
- Inlining
- Scalar optimizations
- Vectorization
3. Backend
The backend translates optimized IR to machine code:
- Instruction Selection: Map IR to target instructions
- Register Allocation: Assign virtual registers to physical registers
- Instruction Scheduling: Optimize instruction order
- Code Emission: Generate final machine code
Supported architectures:
- x86/x86_64
- ARM/ARM64 (AArch64)
- RISC-V
- PowerPC
- MIPS
- WebAssembly
- And many more
Core Components
LLVM Intermediate Representation (IR)
The IR is the heart of LLVM - a low-level, typed, assembly-like language:
Example:
define i32 @add(i32 %a, i32 %b) {
%result = add i32 %a, %b
ret i32 %result
}
Characteristics:
- Static Single Assignment (SSA) form
- Strongly typed
- Platform independent
- Suitable for optimization
- Readable and writable
PassManager
The PassManager orchestrates optimization and transformation passes:
// C++ API example
PassBuilder PB;
ModulePassManager MPM;
MPM.addPass(createModuleToFunctionPassAdaptor(SimplifyCFGPass()));
MPM.addPass(createModuleToFunctionPassAdaptor(InstructionCombiningPass()));
MPM.run(Module, MAM);
Types of Passes:
- Module Passes: Operate on entire module
- Function Passes: Operate on individual functions
- BasicBlock Passes: Operate on basic blocks
- Loop Passes: Operate on loop structures
Analysis Infrastructure
LLVM provides rich analysis capabilities:
- Dominator Trees: Control flow dominance
- Loop Information: Loop structure analysis
- Alias Analysis: Memory dependency analysis
- Call Graph: Function call relationships
- Data Flow: Value flow analysis
LLVM Toolchain
Essential Tools
1. clang
C/C++/Objective-C compiler frontend:
clang -O2 -S -emit-llvm source.c -o source.ll
2. llc
LLVM IR to native assembly compiler:
llc -O2 source.ll -o source.s
3. opt
LLVM IR optimizer:
opt -O3 source.ll -S -o source_opt.ll
4. llvm-link
LLVM IR linker:
llvm-link module1.ll module2.ll -S -o combined.ll
5. llvm-dis
LLVM bitcode disassembler:
llvm-dis source.bc -o source.ll
6. llvm-as
LLVM IR assembler:
llvm-as source.ll -o source.bc
7. lli
LLVM IR interpreter and JIT compiler:
lli source.ll
Analysis and Debug Tools
llvm-objdump
Object file dumper:
llvm-objdump -d binary
llvm-nm
Symbol table viewer:
llvm-nm library.a
llvm-readobj
Object file reader:
llvm-readobj -h binary
llvm-config
LLVM configuration tool:
llvm-config --cxxflags --ldflags --libs core
LLVM in Compilation Pipeline
Typical Compilation Flow
-
Preprocessing:
clang -E source.c -o source.i
-
Compilation to IR:
clang -S -emit-llvm source.i -o source.ll
-
Optimization:
opt -O3 source.ll -S -o source_opt.ll
-
Backend Compilation:
llc source_opt.ll -o source.s
-
Assembly:
as source.s -o source.o
-
Linking:
ld source.o -o executable
Obfuscation Integration Point
Obfussor integrates into this pipeline at the IR level:
Source Code
↓
Clang Frontend
↓
LLVM IR ← ← ← Obfuscation Happens Here
↓
Optimizer (opt)
↓
Backend (llc)
↓
Machine Code
Advantages:
- Platform-independent obfuscation
- Works with optimizations
- Access to full program analysis
- Language-agnostic
LLVM Design Principles
1. Static Single Assignment (SSA) Form
Every variable is assigned exactly once:
; SSA Form
define i32 @example(i32 %x) {
%1 = add i32 %x, 1
%2 = mul i32 %1, 2
%3 = add i32 %2, 3
ret i32 %3
}
Benefits:
- Simplified optimization algorithms
- Easier data flow analysis
- Clearer def-use relationships
2. Type System
Strong, static typing throughout the IR:
; Type examples
i32 ; 32-bit integer
i8* ; Pointer to 8-bit integer
[10 x i32] ; Array of 10 32-bit integers
{i32, i8*, double} ; Structure type
<4 x float> ; Vector of 4 floats
3. Explicit Memory Model
Memory operations are explicit:
%ptr = alloca i32 ; Allocate stack memory
store i32 42, i32* %ptr ; Store value
%val = load i32, i32* %ptr ; Load value
4. Control Flow Representation
Structured control flow using basic blocks:
define i32 @max(i32 %a, i32 %b) {
entry:
%cmp = icmp sgt i32 %a, %b
br i1 %cmp, label %if.then, label %if.else
if.then:
ret i32 %a
if.else:
ret i32 %b
}
LLVM and Obfuscation
Why LLVM is Ideal for Obfuscation
-
IR-Level Transformations
- Platform-independent obfuscation
- Rich semantic information available
- Can leverage existing analyses
-
Modular Pass System
- Easy to add custom obfuscation passes
- Compose multiple techniques
- Integrate with standard optimizations
-
Strong Analysis Infrastructure
- Control flow analysis
- Data flow analysis
- Type information
- Aliasing information
-
Preservation of Semantics
- Type system ensures correctness
- SSA form simplifies transformations
- Built-in verification passes
Common Obfuscation Strategies
LLVM enables various obfuscation approaches:
-
Control Flow Obfuscation
- Manipulate basic block structure
- Insert opaque predicates
- Flatten control flow
-
Data Obfuscation
- Encrypt constant values
- Transform data types
- Obscure memory access patterns
-
Instruction-Level Obfuscation
- Substitute instructions
- Insert dead code
- Use complex instruction patterns
-
Function-Level Obfuscation
- Inline/outline strategically
- Split or merge functions
- Obscure call graphs
Integration with Other Tools
Clang Integration
Obfussor works seamlessly with Clang:
# Compile with Clang to IR
clang -S -emit-llvm source.c -o source.ll
# Apply obfuscation
obfussor-cli obfuscate --input source.ll --output obfuscated.ll
# Continue compilation
llc obfuscated.ll -o obfuscated.s
clang obfuscated.s -o program
Build System Integration
Makefile:
%.obf.ll: %.ll
obfussor-cli obfuscate --input $< --output $@
%.s: %.obf.ll
llc $< -o $@
CMake:
add_custom_command(
OUTPUT obfuscated.ll
COMMAND obfussor-cli obfuscate --input source.ll --output obfuscated.ll
DEPENDS source.ll
)
LLVM Version Compatibility
Obfussor supports LLVM versions:
LLVM Version | Support Status | Notes |
---|---|---|
14.x | Full Support | Recommended |
15.x | Full Support | Current |
16.x | Full Support | Latest |
13.x | Limited | Some features unavailable |
< 13.x | Not Supported | Too old |
Learning Resources
Official Documentation
Books
- "Getting Started with LLVM Core Libraries" by Bruno Cardoso Lopes
- "LLVM Essentials" by Mayur Pandey and Suyog Sarda
- "LLVM Cookbook" by Mayur Pandey and Suyog Sarda
Online Resources
Summary
LLVM provides the foundation for Obfussor's obfuscation capabilities:
- Modular Architecture: Clean separation of concerns
- IR-Level Transformations: Platform-independent obfuscation
- Rich Analysis: Deep understanding of code structure
- Extensible Pass System: Easy integration of custom transformations
- Strong Type System: Ensures semantic preservation
- Industry Standard: Wide adoption and active development
Understanding LLVM is crucial for:
- Configuring obfuscation effectively
- Writing custom obfuscation passes
- Debugging obfuscation issues
- Optimizing obfuscation performance
Next Steps
- LLVM IR Basics: Deep dive into LLVM IR structure
- LLVM Pass System: Understanding the pass infrastructure
- Compilation Pipeline: Complete compilation workflow
- Obfuscation Techniques: How obfuscation leverages LLVM
With this foundation, you're ready to explore how Obfussor leverages LLVM for code protection.
LLVM IR Basics
LLVM Intermediate Representation (IR) is the core language that LLVM uses for program analysis and transformation. Understanding LLVM IR is essential for working with obfuscation techniques, as all transformations operate on this representation.
What is LLVM IR?
LLVM IR is a low-level, typed, assembly-like language that serves as a universal intermediate format between high-level source code and machine code. It combines:
- Low-level operations: Close to machine instructions but platform-independent
- Type safety: Strong static typing prevents invalid operations
- SSA form: Static Single Assignment for optimization
- Readability: Human-readable text format
Three Representations
LLVM IR exists in three equivalent forms:
1. Human-Readable Assembly (.ll files)
define i32 @add(i32 %a, i32 %b) {
%result = add i32 %a, %b
ret i32 %result
}
2. Bitcode (binary .bc files)
Compact binary format for storage and transmission:
llvm-as source.ll -o source.bc
llvm-dis source.bc -o source.ll
3. In-Memory Representation
C++ objects used by the compiler:
Function *F = ...;
BasicBlock *BB = ...;
Instruction *I = ...;
Basic Structure
Module
The top-level container representing a compilation unit:
; ModuleID = 'example.c'
source_filename = "example.c"
target datalayout = "..."
target triple = "x86_64-unknown-linux-gnu"
; Global variables
@global_var = global i32 42
; Function declarations
declare i32 @external_func(i32)
; Function definitions
define i32 @my_function(i32 %param) {
; ... function body ...
}
Functions
Functions are the primary unit of code:
define <return_type> @function_name(<parameters>) {
; function body
}
Example:
define i32 @multiply(i32 %x, i32 %y) {
entry:
%result = mul i32 %x, %y
ret i32 %result
}
Basic Blocks
Basic blocks are sequences of instructions with single entry and exit:
define i32 @example(i32 %n) {
entry: ; First basic block
%cmp = icmp sgt i32 %n, 0
br i1 %cmp, label %positive, label %negative
positive: ; Second basic block
%pos_result = add i32 %n, 1
ret i32 %pos_result
negative: ; Third basic block
%neg_result = sub i32 0, %n
ret i32 %neg_result
}
Rules:
- Must have exactly one entry (label)
- Must have exactly one terminator (ret, br, switch, etc.)
- No branches except at the end
Instructions
Instructions are operations within basic blocks:
%result = add i32 %x, %y ; Arithmetic
%ptr = getelementptr i32, i32* %base, i32 %offset ; Memory
store i32 %value, i32* %ptr ; Memory write
%loaded = load i32, i32* %ptr ; Memory read
br label %next ; Control flow
Type System
LLVM IR has a rich, strongly-typed type system:
Primitive Types
Integer Types
i1 ; Boolean (1 bit)
i8 ; Byte (8 bits)
i16 ; Short (16 bits)
i32 ; Int (32 bits)
i64 ; Long (64 bits)
i128 ; 128-bit integer
Floating Point Types
half ; 16-bit floating point
float ; 32-bit floating point (IEEE 754)
double ; 64-bit floating point (IEEE 754)
x86_fp80 ; 80-bit floating point (x87)
fp128 ; 128-bit floating point
Special Types
void ; No value (for functions)
label ; Basic block labels
metadata ; Metadata for debug info
Derived Types
Pointers
i32* ; Pointer to 32-bit integer
i8** ; Pointer to pointer to 8-bit integer
void (i32)* ; Pointer to function taking i32, returning void
Arrays
[10 x i32] ; Array of 10 32-bit integers
[5 x [3 x double]] ; 2D array of doubles
Structures
{i32, i8*, double} ; Packed structure
{i32, [10 x i8], i32*} ; With array member
%struct.Point = type {float, float} ; Named structure
Vectors
<4 x i32> ; Vector of 4 32-bit integers (SIMD)
<8 x float> ; Vector of 8 floats
Static Single Assignment (SSA)
Every value in LLVM IR is assigned exactly once:
Non-SSA (C-like):
int x = 5;
x = x + 1;
x = x * 2;
SSA (LLVM IR):
%x1 = alloca i32
store i32 5, i32* %x1
%x2 = load i32, i32* %x1
%x3 = add i32 %x2, 1
store i32 %x3, i32* %x1
%x4 = load i32, i32* %x1
%x5 = mul i32 %x4, 2
Phi Nodes
Phi nodes merge values from different control flow paths:
define i32 @select_max(i32 %a, i32 %b) {
entry:
%cmp = icmp sgt i32 %a, %b
br i1 %cmp, label %if.then, label %if.else
if.then:
br label %if.end
if.else:
br label %if.end
if.end:
%result = phi i32 [ %a, %if.then ], [ %b, %if.else ]
ret i32 %result
}
The phi node selects:
%a
if coming from%if.then
%b
if coming from%if.else
Instruction Categories
Arithmetic Instructions
; Integer arithmetic
%sum = add i32 %a, %b
%diff = sub i32 %a, %b
%prod = mul i32 %a, %b
%quot = sdiv i32 %a, %b ; Signed division
%rem = srem i32 %a, %b ; Signed remainder
; Floating point arithmetic
%fsum = fadd float %x, %y
%fdiff = fsub float %x, %y
%fprod = fmul float %x, %y
%fquot = fdiv float %x, %y
Bitwise Instructions
%and_result = and i32 %a, %b
%or_result = or i32 %a, %b
%xor_result = xor i32 %a, %b
%shl_result = shl i32 %a, 2 ; Shift left
%lshr_result = lshr i32 %a, 2 ; Logical shift right
%ashr_result = ashr i32 %a, 2 ; Arithmetic shift right
Comparison Instructions
; Integer comparisons
%eq = icmp eq i32 %a, %b ; Equal
%ne = icmp ne i32 %a, %b ; Not equal
%sgt = icmp sgt i32 %a, %b ; Signed greater than
%slt = icmp slt i32 %a, %b ; Signed less than
%ugt = icmp ugt i32 %a, %b ; Unsigned greater than
; Float comparisons
%feq = fcmp oeq float %x, %y ; Ordered equal
%fgt = fcmp ogt float %x, %y ; Ordered greater than
Memory Instructions
; Stack allocation
%ptr = alloca i32
%arr = alloca [10 x i32]
; Store
store i32 42, i32* %ptr
store i32 %value, i32* %ptr, align 4
; Load
%value = load i32, i32* %ptr
%aligned = load i32, i32* %ptr, align 4
; Pointer arithmetic
%elem_ptr = getelementptr [10 x i32], [10 x i32]* %arr, i32 0, i32 5
Control Flow Instructions
; Unconditional branch
br label %target
; Conditional branch
br i1 %condition, label %true_bb, label %false_bb
; Switch
switch i32 %value, label %default [
i32 0, label %case0
i32 1, label %case1
i32 2, label %case2
]
; Return
ret i32 %result
ret void
Call Instructions
; Direct call
%result = call i32 @function(i32 %arg1, i32 %arg2)
; Indirect call through function pointer
%fn_ptr = load i32 (i32, i32)*, i32 (i32, i32)** %fptr_var
%result = call i32 %fn_ptr(i32 %arg1, i32 %arg2)
; Tail call (optimization)
%result = tail call i32 @function(i32 %arg)
Conversion Instructions
; Integer truncation/extension
%trunc = trunc i32 %value to i8
%zext = zext i8 %byte to i32 ; Zero extend
%sext = sext i8 %byte to i32 ; Sign extend
; Float conversions
%to_float = sitofp i32 %int to float
%to_int = fptosi float %f to i32
; Pointer/integer conversions
%int = ptrtoint i8* %ptr to i64
%ptr = inttoptr i64 %int to i8*
; Bitcast (reinterpret bits)
%float_bits = bitcast i32 %int to float
Constants
Integer Constants
i32 42
i32 -17
i1 true
i1 false
Floating Point Constants
float 3.14
double 2.718281828
Null and Undefined
i32* null ; Null pointer
i32 undef ; Undefined value
i32 poison ; Poison value (LLVM 12+)
Aggregate Constants
[3 x i32] [i32 1, i32 2, i32 3]
{i32, float} {i32 42, float 3.14}
<4 x i32> <i32 1, i32 2, i32 3, i32 4>
Constant Expressions
@global = global i32* getelementptr (i32, i32* @array, i32 5)
@ptr = global i8* bitcast (i32* @value to i8*)
Attributes
Attributes provide additional information:
Function Attributes
define i32 @example() nounwind readnone {
ret i32 42
}
; Common attributes:
; - nounwind: doesn't throw exceptions
; - readnone: doesn't read/write memory
; - readonly: doesn't write memory
; - alwaysinline: force inline
; - noinline: prevent inlining
Parameter Attributes
define void @example(i32* noalias %ptr, i32 signext %value) {
; ...
}
; Common attributes:
; - noalias: pointer doesn't alias
; - readonly: parameter not modified
; - nocapture: pointer not captured
; - signext/zeroext: sign/zero extended
Calling Conventions
define fastcc i32 @fast_function(i32 %arg) {
; ...
}
; Conventions:
; - ccc: C calling convention (default)
; - fastcc: Fast calling convention
; - coldcc: Cold calling convention
Metadata
Metadata provides debugging and optimization hints:
define i32 @example(i32 %n) !dbg !1 {
%result = add i32 %n, 1, !dbg !2
ret i32 %result
}
!llvm.dbg.cu = !{!0}
!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1)
!1 = !DIFile(filename: "example.c", directory: "/path")
!2 = !DILocation(line: 5, column: 12, scope: !1)
Example: Complete Function
Here's a complete example showing various IR features:
; Function: Compute factorial
define i64 @factorial(i64 %n) {
entry:
; Check if n <= 1
%cmp = icmp sle i64 %n, 1
br i1 %cmp, label %base_case, label %recursive_case
base_case:
; Base case: return 1
ret i64 1
recursive_case:
; Recursive case: n * factorial(n-1)
%n_minus_1 = sub i64 %n, 1
%rec_result = call i64 @factorial(i64 %n_minus_1)
%result = mul i64 %n, %rec_result
ret i64 %result
}
Working with LLVM IR
Generating IR from C
# Generate human-readable IR
clang -S -emit-llvm example.c -o example.ll
# Generate optimized IR
clang -S -emit-llvm -O2 example.c -o example_opt.ll
# Generate bitcode
clang -c -emit-llvm example.c -o example.bc
Inspecting IR
# View IR
cat example.ll
less example.ll
# Disassemble bitcode
llvm-dis example.bc -o example.ll
# View with syntax highlighting
vim example.ll # or your preferred editor
Validating IR
# Check IR is well-formed
opt -verify example.ll -S -o /dev/null
# Run specific verification
opt -verify-each example.ll -S -o /dev/null
IR in Obfuscation
Understanding IR is crucial for obfuscation:
Why IR Level?
- Platform Independence: Transform once, compile anywhere
- Rich Information: Type and structure information available
- Analysis Power: Leverage LLVM's analysis passes
- Composability: Combine with standard optimizations
Transformation Examples
Original:
define i32 @simple(i32 %x) {
%result = add i32 %x, 10
ret i32 %result
}
After Control Flow Flattening:
define i32 @simple(i32 %x) {
entry:
%state = alloca i32
store i32 0, i32* %state
br label %dispatcher
dispatcher:
%s = load i32, i32* %state
switch i32 %s, label %exit [
i32 0, label %block0
i32 1, label %block1
]
block0:
%result = add i32 %x, 10
store i32 1, i32* %state
br label %dispatcher
block1:
ret i32 %result
exit:
unreachable
}
Common Patterns
Allocating and Using Local Variables
define void @local_vars() {
%x = alloca i32
store i32 42, i32* %x
%val = load i32, i32* %x
; use %val...
ret void
}
Array Access
define i32 @array_access() {
%arr = alloca [10 x i32]
%elem_ptr = getelementptr [10 x i32], [10 x i32]* %arr, i32 0, i32 5
store i32 42, i32* %elem_ptr
%val = load i32, i32* %elem_ptr
ret i32 %val
}
Structure Access
%struct.Point = type { float, float }
define float @get_x(%struct.Point* %p) {
%x_ptr = getelementptr %struct.Point, %struct.Point* %p, i32 0, i32 0
%x = load float, float* %x_ptr
ret float %x
}
Summary
LLVM IR is:
- Low-level but platform-independent
- Strongly typed ensuring correctness
- In SSA form simplifying analysis
- Human-readable for debugging
- The foundation for LLVM transformations
Key concepts:
- Modules contain functions
- Functions contain basic blocks
- Basic blocks contain instructions
- All values are typed
- SSA form with phi nodes
- Rich instruction set for operations
Next Steps
- LLVM Pass System: Learn about transformation passes
- Compilation Pipeline: See IR in the full compilation flow
- Obfuscation Techniques: How techniques transform IR
Mastering LLVM IR is essential for understanding and customizing obfuscation techniques.
LLVM Pass System
The LLVM Pass framework is the infrastructure that enables code analysis and transformation. Understanding the pass system is essential for implementing and using obfuscation techniques in Obfussor.
What is an LLVM Pass?
An LLVM Pass is a unit of compilation work that performs analysis or transformation on LLVM IR. Passes are:
- Modular: Self-contained units of functionality
- Composable: Can be combined in sequences
- Reusable: Can be applied to different modules
- Analyzable: Can depend on other passes
Pass Types
1. Module Pass
Operates on entire modules (all functions and globals):
struct MyModulePass : public ModulePass {
static char ID;
bool runOnModule(Module &M) override {
// Process all functions in module
for (Function &F : M) {
// Process function
}
return true; // Module was modified
}
};
Use Cases:
- Inter-procedural analysis
- Global transformations
- Call graph construction
2. Function Pass
Operates on individual functions:
struct MyFunctionPass : public FunctionPass {
static char ID;
bool runOnFunction(Function &F) override {
// Process all basic blocks
for (BasicBlock &BB : F) {
// Process basic block
}
return true; // Function was modified
}
};
Use Cases:
- Intra-procedural optimizations
- Function-level obfuscation
- Local analysis
3. BasicBlock Pass
Operates on individual basic blocks:
struct MyBasicBlockPass : public BasicBlockPass {
static char ID;
bool runOnBasicBlock(BasicBlock &BB) override {
for (Instruction &I : BB) {
// Process instruction
}
return true; // Basic block was modified
}
};
Use Cases:
- Local optimizations
- Instruction-level transformations
4. Loop Pass
Operates on loop structures:
struct MyLoopPass : public LoopPass {
static char ID;
bool runOnLoop(Loop *L, LPPassManager &LPM) override {
// Process loop
for (BasicBlock *BB : L->blocks()) {
// Process blocks in loop
}
return true;
}
};
Use Cases:
- Loop optimizations
- Loop obfuscation
- Loop vectorization
Pass Manager
The Pass Manager orchestrates pass execution:
Legacy Pass Manager (Pre-LLVM 14)
legacy::PassManager PM;
PM.add(createPromoteMemoryToRegisterPass());
PM.add(new MyCustomPass());
PM.run(Module);
New Pass Manager (LLVM 14+)
ModulePassManager MPM;
FunctionPassManager FPM;
// Add function passes
FPM.addPass(SimplifyCFGPass());
FPM.addPass(InstructionCombiningPass());
// Add function pass manager to module pass manager
MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
// Run passes
ModuleAnalysisManager MAM;
MPM.run(Module, MAM);
Pass Dependencies
Passes can declare dependencies on other passes:
void MyPass::getAnalysisUsage(AnalysisUsage &AU) const {
// This pass requires dominator tree
AU.addRequired<DominatorTreeWrapperPass>();
// This pass preserves CFG
AU.setPreservesCFG();
// This pass doesn't modify anything
AU.setPreservesAll();
}
// Using the analysis
bool MyPass::runOnFunction(Function &F) {
DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
// Use dominator tree...
}
Common Analysis Passes
Dominator Tree
Computes dominance relationships:
DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
if (DT.dominates(BB1, BB2)) {
// BB1 dominates BB2
}
BasicBlock *IDom = DT.getNode(BB)->getIDom()->getBlock();
Loop Information
Analyzes loop structure:
LoopInfo &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
for (Loop *L : LI) {
BasicBlock *Header = L->getHeader();
unsigned Depth = L->getLoopDepth();
// Process loop
}
Alias Analysis
Determines memory aliasing:
AliasAnalysis &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
if (AA.alias(Ptr1, Ptr2) == AliasResult::NoAlias) {
// Pointers don't alias
}
Call Graph
Represents function call relationships:
CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();
for (auto &Node : CG) {
Function *F = Node.first;
for (auto &CallRecord : *Node.second) {
Function *Callee = CallRecord.second->getFunction();
}
}
Writing a Custom Pass
Step 1: Define Pass Class
#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
namespace {
struct CountInstructionsPass : public FunctionPass {
static char ID;
CountInstructionsPass() : FunctionPass(ID) {}
bool runOnFunction(Function &F) override {
unsigned Count = 0;
for (BasicBlock &BB : F) {
Count += BB.size();
}
errs() << "Function " << F.getName()
<< " has " << Count << " instructions\n";
return false; // Didn't modify the function
}
};
}
char CountInstructionsPass::ID = 0;
Step 2: Register the Pass
static RegisterPass<CountInstructionsPass> X(
"count-instructions",
"Count instructions in functions",
false, // Only looks at CFG
true // Analysis pass
);
Step 3: Build and Load
# Build pass as shared library
clang++ -shared -fPIC MyPass.cpp -o MyPass.so \
`llvm-config --cxxflags --ldflags`
# Load and run pass
opt -load MyPass.so -count-instructions < input.bc > output.bc
Pass Scheduling
The pass manager schedules passes optimally:
Module Pass 1
Function Pass A (on each function)
Function Pass B (on each function)
Module Pass 2
Function Pass C (on each function)
This minimizes:
- Redundant analysis
- Cache misses
- Compilation time
Obfuscation Passes
Control Flow Flattening Pass
struct FlatteningPass : public FunctionPass {
bool runOnFunction(Function &F) override {
// Don't flatten already flat functions
if (isAlreadyFlat(&F)) return false;
// Split basic blocks
std::vector<BasicBlock*> Blocks;
for (BasicBlock &BB : F) {
Blocks.push_back(&BB);
}
// Create switch variable
AllocaInst *SwitchVar =
new AllocaInst(Type::getInt32Ty(F.getContext()));
// Create dispatcher block
BasicBlock *Dispatcher =
BasicBlock::Create(F.getContext(), "dispatcher", &F);
// Build switch instruction
SwitchInst *Switch = SwitchInst::Create(
SwitchVar, DefaultBlock, Blocks.size(), Dispatcher);
// Update blocks to branch to dispatcher
for (unsigned i = 0; i < Blocks.size(); ++i) {
// Modify terminator to update state and branch to dispatcher
// ... implementation details ...
}
return true;
}
};
String Encryption Pass
struct StringEncryptionPass : public ModulePass {
bool runOnModule(Module &M) override {
for (GlobalVariable &GV : M.globals()) {
if (!GV.hasInitializer()) continue;
Constant *Init = GV.getInitializer();
if (ConstantDataArray *CDA = dyn_cast<ConstantDataArray>(Init)) {
if (CDA->isString()) {
// Encrypt the string
std::string Original = CDA->getAsString().str();
std::vector<uint8_t> Encrypted = encryptString(Original);
// Replace with encrypted version
Constant *NewInit = ConstantDataArray::get(
M.getContext(), Encrypted);
GV.setInitializer(NewInit);
// Insert decryption code at usage sites
insertDecryptionCode(&GV, M);
}
}
}
return true;
}
};
Pass Options and Configuration
Passes can accept options:
static cl::opt<unsigned> ObfuscationLevel(
"obf-level",
cl::desc("Obfuscation intensity level (1-5)"),
cl::init(3)
);
struct ConfigurablePass : public FunctionPass {
bool runOnFunction(Function &F) override {
unsigned Level = ObfuscationLevel;
// Apply obfuscation based on level
return true;
}
};
Use from command line:
opt -load ObfPass.so -my-pass -obf-level=5 < input.bc > output.bc
Pass Debugging
Print IR Before/After
# Print IR after each pass
opt -print-after-all -O2 input.ll -S -o output.ll
# Print only specific pass
opt -print-after=my-pass input.ll -S -o output.ll
Verify IR
# Run verifier after each pass
opt -verify-each -O2 input.ll -S -o output.ll
Debug Pass Execution
#define DEBUG_TYPE "my-pass"
LLVM_DEBUG(dbgs() << "Processing function: " << F.getName() << "\n");
LLVM_DEBUG(dbgs() << "Found " << Count << " instructions\n");
Enable debug output:
opt -debug -debug-only=my-pass -my-pass < input.bc > output.bc
Best Practices
1. Preserve Analysis When Possible
void MyPass::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesCFG(); // If CFG unchanged
AU.addPreserved<LoopInfoWrapperPass>(); // If loops unchanged
}
2. Update Analysis After Modification
DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
// Modify IR
BasicBlock *NewBB = SplitBlock(BB, I, &DT);
// DT is automatically updated
3. Use LLVM IR Builder
IRBuilder<> Builder(Context);
Builder.SetInsertPoint(InsertBefore);
Value *Sum = Builder.CreateAdd(A, B, "sum");
Value *Product = Builder.CreateMul(Sum, C, "product");
4. Handle Edge Cases
bool runOnFunction(Function &F) override {
// Skip declarations
if (F.isDeclaration()) return false;
// Skip functions with specific attributes
if (F.hasFnAttribute("no-obfuscate")) return false;
// Process function
return true;
}
Integration with Obfussor
Obfussor uses custom passes for each obfuscation technique:
Source Code
↓
LLVM IR
↓
Control Flow Flattening Pass
↓
String Encryption Pass
↓
Bogus Control Flow Pass
↓
Instruction Substitution Pass
↓
Optimization Passes
↓
Obfuscated Binary
Each pass:
- Operates on LLVM IR
- Preserves semantics
- Can be enabled/disabled
- Has configurable intensity
Summary
The LLVM Pass system:
- Provides modular transformation framework
- Enables analysis and optimization
- Supports custom passes for obfuscation
- Manages dependencies automatically
- Schedules passes efficiently
Key concepts:
- Different pass types (Module, Function, BasicBlock, Loop)
- Pass Manager orchestrates execution
- Analysis passes provide information
- Transformation passes modify IR
- Dependencies ensure correct ordering
Next Steps
- Compilation Pipeline: See passes in action
- Obfuscation Techniques: Obfuscation passes
- Custom Passes: Write your own passes
The pass system is the engine that powers LLVM obfuscation.
Compilation Pipeline
Understanding the complete LLVM compilation pipeline is essential for knowing where and how obfuscation fits into the build process. This chapter explains the end-to-end compilation flow and how Obfussor integrates seamlessly.
Standard LLVM Compilation Pipeline
Complete Flow
Source Code (.c, .cpp)
↓
Preprocessor
↓
Preprocessed Source (.i)
↓
Frontend (Clang)
↓
LLVM IR (.ll or .bc)
↓
Optimizer (opt)
↓
Optimized LLVM IR
↓
Backend (llc)
↓
Assembly Code (.s)
↓
Assembler (as)
↓
Object File (.o)
↓
Linker (ld)
↓
Executable/Library
Phase-by-Phase Breakdown
1. Preprocessing
clang -E source.c -o source.i
What Happens:
- Includes header files (
#include
) - Expands macros (
#define
) - Processes conditionals (
#ifdef
) - Removes comments
Output: Preprocessed source code
2. Compilation to IR
clang -S -emit-llvm source.i -o source.ll
What Happens:
- Lexical analysis (tokenization)
- Syntax analysis (parsing)
- Semantic analysis (type checking)
- IR generation
Output: LLVM IR (human-readable .ll
or bitcode .bc
)
3. Optimization
opt -O3 source.ll -S -o source_opt.ll
What Happens:
- Analysis passes gather information
- Transformation passes modify IR
- Dead code elimination
- Function inlining
- Loop optimizations
- Constant propagation
Output: Optimized LLVM IR
4. Backend Compilation
llc -O2 source_opt.ll -o source.s
What Happens:
- Instruction selection
- Register allocation
- Instruction scheduling
- Code emission
Output: Assembly code for target architecture
5. Assembly
as source.s -o source.o
What Happens:
- Convert assembly to machine code
- Generate object file format (ELF, Mach-O, COFF)
Output: Object file
6. Linking
ld source.o -o executable
# Or using clang:
clang source.o -o executable
What Happens:
- Resolve symbols
- Combine object files
- Link libraries
- Generate executable
Output: Final executable or library
Obfuscation-Enhanced Pipeline
Where Obfuscation Fits
Source Code
↓
Frontend
↓
LLVM IR
↓
┌───────────────────┐
│ OBFUSCATION LAYER │ ← Obfussor operates here
│ │
│ • Control Flow │
│ • String Encrypt │
│ • Bogus Code │
│ • Inst. Subst. │
└───────────────────┘
↓
Obfuscated LLVM IR
↓
Optimizer
↓
Backend
↓
Obfuscated Binary
Obfuscation Pipeline
# 1. Compile to IR
clang -S -emit-llvm source.c -o source.ll
# 2. Apply obfuscation passes
opt -load ObfuscatorPass.so \
-control-flow-flattening \
-string-encryption \
-bogus-control-flow \
source.ll -S -o obfuscated.ll
# 3. Optimize obfuscated IR
opt -O2 obfuscated.ll -S -o obfuscated_opt.ll
# 4. Compile to binary
llc obfuscated_opt.ll -o obfuscated.s
clang obfuscated.s -o program
Integration Methods
Method 1: Standalone Pass
Apply obfuscation as separate compilation step:
# Standard pipeline with obfuscation inserted
clang -S -emit-llvm source.c -o source.ll
obfussor-cli obfuscate --input source.ll --output obf.ll
opt -O2 obf.ll -S -o obf_opt.ll
llc obf_opt.ll -o obf.s
clang obf.s -o program
Method 2: Integrated with opt
Load obfuscation passes into opt:
opt -load /path/to/ObfuscatorPass.so \
-control-flow-flattening \
-string-encryption \
-O2 \
source.ll -o obfuscated.bc
Method 3: Compiler Plugin
Use Clang plugin interface:
clang -fplugin=/path/to/ObfuscatorPlugin.so \
-mllvm -obfuscate \
source.c -o program
Method 4: LTO (Link-Time Optimization)
Apply obfuscation during link time:
# Compile with LTO
clang -flto -c source1.c -o source1.o
clang -flto -c source2.c -o source2.o
# Link with obfuscation
clang -flto source1.o source2.o \
-Wl,-mllvm=-obfuscate \
-o program
Obfussor CLI Integration
Basic Usage
obfussor-cli obfuscate \
--input source.c \
--output obfuscated \
--techniques cff,str,bog
Internal Pipeline:
source.c → clang → IR → Obfuscation Passes → opt → llc → Binary
Advanced Configuration
obfussor-cli obfuscate \
--input source.c \
--output obfuscated \
--config obf-config.json \
--ir-output obfuscated.ll \
--optimization-level O2
With Custom Passes:
obfussor-cli obfuscate \
--input source.c \
--output obfuscated \
--custom-pass /path/to/MyPass.so \
--pass-options "level=5,seed=42"
Build System Integration
Makefile Integration
CC = clang
OBFUSSOR = obfussor-cli
OPT_LEVEL = -O2
# Obfuscation rules
%.ll: %.c
$(CC) -S -emit-llvm $< -o $@
%.obf.ll: %.ll
$(OBFUSSOR) obfuscate --input $< --output $@
%.o: %.obf.ll
$(CC) -c $< -o $@
# Link
program: main.o utils.o
$(CC) $(OPT_LEVEL) $^ -o $@
.PHONY: clean
clean:
rm -f *.ll *.o program
CMake Integration
# Find LLVM
find_package(LLVM REQUIRED CONFIG)
include_directories(${LLVM_INCLUDE_DIRS})
# Custom command for obfuscation
function(add_obfuscated_executable target)
set(sources ${ARGN})
set(obfuscated_sources "")
foreach(src ${sources})
# Generate IR
set(ir_file "${CMAKE_BINARY_DIR}/${src}.ll")
add_custom_command(
OUTPUT ${ir_file}
COMMAND ${CMAKE_C_COMPILER} -S -emit-llvm
${CMAKE_SOURCE_DIR}/${src} -o ${ir_file}
DEPENDS ${src}
)
# Obfuscate IR
set(obf_file "${CMAKE_BINARY_DIR}/${src}.obf.ll")
add_custom_command(
OUTPUT ${obf_file}
COMMAND obfussor-cli obfuscate
--input ${ir_file} --output ${obf_file}
DEPENDS ${ir_file}
)
list(APPEND obfuscated_sources ${obf_file})
endforeach()
add_executable(${target} ${obfuscated_sources})
endfunction()
# Usage
add_obfuscated_executable(my_program main.c utils.c)
Bazel Integration
# BUILD file
load("//tools:obfuscation.bzl", "obfuscated_cc_binary")
obfuscated_cc_binary(
name = "my_program",
srcs = ["main.c", "utils.c"],
obfuscation_config = "obf-config.json",
)
Multi-File Projects
Approach 1: Individual File Obfuscation
# Obfuscate each file separately
for src in *.c; do
clang -S -emit-llvm $src -o ${src%.c}.ll
obfussor-cli obfuscate --input ${src%.c}.ll --output ${src%.c}.obf.ll
done
# Compile and link
clang *.obf.ll -o program
Approach 2: Whole Program Obfuscation
# Combine all source files
llvm-link $(find . -name "*.ll") -S -o combined.ll
# Obfuscate combined IR
obfussor-cli obfuscate --input combined.ll --output obfuscated.ll
# Compile to binary
llc obfuscated.ll -o obfuscated.s
clang obfuscated.s -o program
Approach 3: LTO-based
# Compile with LTO
clang -flto -c *.c
# Link with obfuscation at link time
clang -flto -fuse-ld=gold -Wl,-plugin-opt=obfuscate *.o -o program
Cross-Compilation
Targeting Different Architectures
# Compile for ARM64
clang -target aarch64-linux-gnu -S -emit-llvm source.c -o source.ll
# Obfuscate (platform-independent)
obfussor-cli obfuscate --input source.ll --output obf.ll
# Compile for ARM64
llc -march=aarch64 obf.ll -o obf.s
aarch64-linux-gnu-gcc obf.s -o program-arm64
Multi-Target Build
#!/bin/bash
TARGETS=("x86_64-linux-gnu" "aarch64-linux-gnu" "arm-linux-gnueabi")
for target in "${TARGETS[@]}"; do
# Generate IR (target-independent)
clang -S -emit-llvm source.c -o source.ll
# Obfuscate (once, for all targets)
obfussor-cli obfuscate --input source.ll --output obf.ll
# Compile for specific target
llc -march=${target%%-*} obf.ll -o obf-${target}.s
${target}-gcc obf-${target}.s -o program-${target}
done
Optimization Considerations
Before or After Obfuscation?
Optimize Before Obfuscation
# Optimize first
opt -O3 source.ll -S -o optimized.ll
# Then obfuscate
obfussor-cli obfuscate --input optimized.ll --output obf.ll
Pros:
- Better performance
- Cleaner IR for obfuscation
Cons:
- Optimizations may undo obfuscation
Optimize After Obfuscation
# Obfuscate first
obfussor-cli obfuscate --input source.ll --output obf.ll
# Then optimize
opt -O2 obf.ll -S -o obf_opt.ll
Pros:
- Preserves obfuscation
- Can optimize obfuscated code
Cons:
- May have performance impact
Recommended: Both
# Light optimization before
opt -O1 source.ll -S -o pre_opt.ll
# Obfuscate
obfussor-cli obfuscate --input pre_opt.ll --output obf.ll
# Optimize after (carefully)
opt -O2 -disable-simplify-cfg obf.ll -S -o final.ll
Debugging Obfuscated Code
Preserve Debug Info
# Compile with debug info
clang -g -S -emit-llvm source.c -o source.ll
# Obfuscate while preserving debug metadata
obfussor-cli obfuscate --input source.ll --output obf.ll \
--preserve-debug-info
# Compile with debug info
llc -filetype=obj obf.ll -o obf.o
clang -g obf.o -o program
Separate Debug and Release Pipelines
# Debug build (minimal obfuscation)
obfussor-cli obfuscate \
--input source.ll \
--output debug.ll \
--config debug-config.json # Minimal obfuscation
# Release build (maximum obfuscation)
obfussor-cli obfuscate \
--input source.ll \
--output release.ll \
--config release-config.json # Maximum obfuscation
Performance Profiling
Measure Compilation Time
#!/bin/bash
echo "Baseline compilation:"
time clang -O2 source.c -o baseline
echo "With obfuscation:"
time obfussor-cli obfuscate \
--input source.c \
--output obfuscated \
--config obf-config.json
Measure Runtime Impact
# Build both versions
clang -O2 source.c -o baseline
obfussor-cli obfuscate --input source.c --output obfuscated
# Benchmark
echo "Baseline:"
time ./baseline
echo "Obfuscated:"
time ./obfuscated
Summary
The LLVM compilation pipeline:
- Transforms source code through multiple stages
- Obfuscation integrates at IR level
- Can be applied at various points
- Supports multiple build systems
- Works with cross-compilation
Key integration points:
- Standalone obfuscation pass
- Integrated with opt
- Compiler plugin
- Link-time obfuscation
Best practices:
- Choose appropriate optimization strategy
- Use build system integration
- Consider multi-file projects
- Profile performance impact
Next Steps
- Obfuscation Techniques: Learn about specific techniques
- Configuration: Configure the pipeline
- Advanced Topics: Optimize your pipeline
Understanding the pipeline enables effective obfuscation integration.