-
-
Save tcoppex/443d1dd45f873d96260195d6431b0989 to your computer and use it in GitHub Desktop.
| ################################################################### | |
| Writing C software without the standard library | |
| Linux Edition | |
| ################################################################### | |
| There are many tutorials on the web that explain how to build a | |
| simple hello world in C without the libc on AMD64, but most of them | |
| stop there. | |
| I will provide a more complete explanation that will allow you to | |
| build yourself a little framework to write more complex programs. | |
| The code will support both AMD64 and i386. | |
| Major credits to http://betteros.org/ which got me into researching | |
| libc-free programming. | |
| Why would you want to avoid libc? | |
| - Your code will have no dependencies other than the compiler. | |
| - Not including the massive header files and not linking the | |
| standard library makes compilation faster. It will be nearly | |
| instantaneous even for thousands of lines of code. | |
| - Executables are incredibly small (the http mirror server for my | |
| gopherspace is powered by a 10kb executable). | |
| - Easy to optimize for embedded computers that have very limited | |
| resources. | |
| - Easy to port to other architectures as long as they are | |
| documented, without having to worry whether the libs you use | |
| support it or not. | |
| - Above all, it exposes the inner workings of the OS, architecture | |
| and libc, which teaches you a lot and makes you more aware of | |
| what you're doing even when using high level libraries. | |
| - It's a fun challenge! | |
| I might not be an expert yet, but I will share my methods with you. | |
| For now this guide is linux-only, but I will be writing a windows | |
| version when I feel like firing up a virtual machine. | |
| ################################################################### | |
| Basic AMD64 Setup | |
| ################################################################### | |
| When we learn C, we are taught that main is the first function | |
| called in a C program. In reality, main is simply a convention of | |
| the standard library. | |
| Let's write a simple hello world and debug it. | |
| We will compile with debug information (flag -g) as well as no | |
| optimization (-O0) to be able to see as much as possible in the | |
| debugger. | |
| ------------------------------------------------------------------- | |
| $ cat > hello.c << "EOF" | |
| #include <stdio.h> | |
| int main(int argc, char* argv[]) | |
| { | |
| printf("hello\n"); | |
| return 0; | |
| } | |
| EOF | |
| $ gcc -O0 -g hello.c | |
| $ ./a.out | |
| hello | |
| $ gdb a.out | |
| (gdb) break main | |
| (gdb) run | |
| (gdb) backtrace | |
| #0 main (argc=1, argv=0x7fffffffd7f8) at hello.c:6 | |
| ------------------------------------------------------------------- | |
| Hmm... seems like gdb is hiding stuff from us. Let's tell it that | |
| we actually care about seeing libc functions: | |
| ------------------------------------------------------------------- | |
| (gdb) set backtrace past-main on | |
| (gdb) set backtrace past-entry on | |
| (gdb) bt | |
| #0 main (argc=1, argv=0x7fffffffd7f8) at hello.c:6 | |
| #1 0x00007ffff7a5f630 in __libc_start_main (main=0x400556 <main>, | |
| argc=1, argv=0x7fffffffd7f8, init=<optimized out>, | |
| fini=<optimized out>, rtld_fini=<optimized out>, | |
| stack_end=0x7fffffffd7e8) | |
| at libc-start.c:289 | |
| #2 0x0000000000400489 in _start () | |
| ------------------------------------------------------------------- | |
| That's much better! As we can see, the first function that's really | |
| called is _start, which then calls __libc_start_main which is | |
| clearly a standard library initialization function which then calls | |
| main. | |
| You can go take a look at _start __libc_start_main in glibc source | |
| if you want, but it's not very interesting for us as it sets up a | |
| bunch of stuff for dynamic linking and such that we will never use | |
| since we want a static executable. | |
| Let's recompile our hello world with optimization (-O2), without | |
| debug information and with stripping (-s) to see how large it is: | |
| ------------------------------------------------------------------- | |
| $ gcc -s -O2 hello.c | |
| $ wc -c a.out | |
| 6208 a.out | |
| ------------------------------------------------------------------- | |
| 6kb for a simple hello world? that's a lot! | |
| Even if I add other size optimization flags such as | |
| -Wl,--gc-sections -fno-unwind-tables | |
| -fno-asynchronous-unwind-tables -Os it just won't go below 6kb. | |
| We will now progressively strip this program down by first getting | |
| rid of the standard library and then learning how to invoke | |
| syscalls without having to include any headers. | |
| So how do we get rid of the standard library? If we try to compile | |
| our current code with -nostdlib we will run into linker errors: | |
| ------------------------------------------------------------------- | |
| $ gcc -s -O2 -nostdlib hello.c | |
| /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../x86_64-pc-linux- | |
| gnu/bin/ld: warning: cannot find entry symbol _start; defaulting to | |
| 0000000000400120 | |
| /tmp/ccTn8ClC.o: In function `main': | |
| hello.c:(.text.startup+0xa): undefined reference to `puts' | |
| collect2: error: ld returned 1 exit status | |
| ------------------------------------------------------------------- | |
| The linker is complaining about _start missing, which is what we | |
| would expect from our previous debugging. | |
| We also have a linker error on puts, which is to be expected since | |
| it's a libc function. But how do we print "hello" without puts? | |
| The linux kernel exposes a bunch of syscalls, which are functions | |
| that user-space programs can enter to interact with the OS. | |
| You can see a list of syscalls by running "man syscalls" or | |
| visiting this site: | |
| http://man7.org/linux/man-pages/man2/syscalls.2.html | |
| How do we find out which syscall puts uses? We can either look | |
| through the syscall list, or simply install strace to trace | |
| syscalls and write a simple program that uses puts. | |
| The strace method is extremely useful. If you don't know how to | |
| do something with syscalls, do it with libc and then strace it to | |
| see which syscalls it uses on the target architecture. | |
| ------------------------------------------------------------------- | |
| $ cat > puts.c << "EOF" | |
| #include <stdio.h> | |
| int main(int argc, char* argv[]) | |
| { | |
| puts("hello"); | |
| return 0; | |
| } | |
| EOF | |
| $ gcc puts.c | |
| $ strace ./a.out > /dev/null | |
| - stuff we don't care about - | |
| write(1, "hello\n", 6) = 6 | |
| exit_group(0) = ? | |
| +++ exited with 0 +++ | |
| ------------------------------------------------------------------- | |
| So it's using the write syscall. | |
| Note how I pipe stdout to /dev/null in strace? That's because | |
| strace output is in stderr and we don't want to have it mixed with | |
| a.out's output. | |
| Let's check the manpage for write: | |
| ------------------------------------------------------------------- | |
| $ man 2 write | |
| SYNOPSIS | |
| #include <unistd.h> | |
| ssize_t write(int fd, const void *buf, size_t count); | |
| DESCRIPTION | |
| write() writes up to count bytes from the buffer pointed | |
| buf to the file referred to by the file descriptor fd. | |
| ------------------------------------------------------------------- | |
| In linux, there are 3 standard file descriptors: | |
| - stdin: used to pipe data into the program or to read user input. | |
| - stdout: output | |
| - stderr: alternate output for error messages | |
| If we read "man stdout", we will see that they are simply defined | |
| as 0, 1 and 2. | |
| So all we have to do is replace our puts with a write to stream 1 | |
| (stdout). | |
| ------------------------------------------------------------------- | |
| #include <unistd.h> | |
| int main(int argc, char* argv[]) | |
| { | |
| write(1, "hello\n", 6); | |
| return 0; | |
| } | |
| ------------------------------------------------------------------- | |
| Let's try to compile it again: | |
| ------------------------------------------------------------------- | |
| $ gcc -s -O2 -nostdlib hello.c | |
| hello.c: In function ?main?: | |
| hello.c:6:5: warning: ignoring return value of ?write?, declared | |
| with attribute warn_unused_result [-Wunused-result] | |
| write(1, "hello\n", 6); | |
| ^ | |
| /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../x86_64-pc-linux- | |
| gnu/bin/ld: warning: cannot find entry symbol _start; defaulting to | |
| 0000000000400120 | |
| /tmp/ccJXwSsr.o: In function `main': | |
| hello.c:(.text.startup+0x14): undefined reference to `write' | |
| collect2: error: ld returned 1 exit status | |
| ------------------------------------------------------------------- | |
| Oh no! The "write" function is part of the standard library! | |
| How do we invoke syscalls without having to link the standard lib? | |
| Let's take a look at section "A.2.1 Calling Conventions" of the | |
| AMD64 ABI specification. If you are on i386 (32-bit), just follow | |
| along, we will port this to i386 soon in a moment. | |
| If you're completely clueless about asm, you should still be | |
| able to understand once you see the example. I'm not that good | |
| at asm myself. | |
| https://software.intel.com/sites/default/files/article/402129/ | |
| mpx-linux64-abi.pdf | |
| ------------------------------------------------------------------- | |
| 1. User-level applications use as integer registers for passing the | |
| sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface | |
| uses %rdi, %rsi, %rdx, %r10, %r8 and %r9. | |
| 2. A system-call is done via the syscall instruction. The kernel | |
| destroys registers %rcx and %r11. | |
| 3. The number of the syscall has to be passed in register %rax. | |
| 4. System-calls are limited to six arguments, no argument is passed | |
| directly on the stack. | |
| 5. Returning from the syscall, register %rax contains the result of | |
| the system-call. A value in the range between -4095 and -1 | |
| indicates an error, it is -errno. | |
| 6. Only values of class INTEGER or class MEMORY are passed to the | |
| kernel. | |
| ------------------------------------------------------------------- | |
| In poor words, all we need to do is write an asm wrapper that: | |
| - takes the syscall number followed by either pointers or integers | |
| as parameters | |
| - sets rax to the syscall number | |
| - sets rdi, rsi, rdx, r10, r8 and r9 to the parameters. calls that | |
| take less than 6 parameters will ignore the excess ones. | |
| - executes "syscall" | |
| - returns the contents of rax | |
| Now if we read section 3.4 of the specification or the quick | |
| cheatsheet at http://wiki.osdev.org/Calling_Conventions , we will | |
| see that on AMD64 the registers used to pass parameters to regular | |
| functions are almost the same as the syscalls, except for r10 which | |
| is replaced with rcx. The return register is also the same (rax). | |
| This means that our syscall wrapper will only be able to accept and | |
| forward a maximum of 5 parameters (because the first parameter is | |
| already being used to pass the syscall number). | |
| We could use the stack to take more than 6 parameters, but let's | |
| not make our lives more complicated when we don't even need to call | |
| syscalls with 6 parameters yet. | |
| The abi also states that: | |
| ------------------------------------------------------------------- | |
| Registers %rbp, %rbx and %r12 through %r15 ?belong? to the calling | |
| function and the called function is required to preserve their | |
| values. In other words, a called function must preserve these | |
| registers? values for its caller. Remaining registers ?belong? to | |
| the called function. If a calling function wants to preserve such a | |
| register value across a function call, it must save the value in | |
| its local stack frame. | |
| ------------------------------------------------------------------- | |
| Which means that we don't have to worry about saving and restoring | |
| the values of rdi, rsi, rdx, r10, r8 and r9 inside of our syscall | |
| wrapper, because it's up to the caller to save them and gcc will | |
| take care of that (since we will be calling it from C code). | |
| Putting it all together, this will be our syscall wrapper (in intel | |
| syntax): | |
| ------------------------------------------------------------------- | |
| mov rax,rdi /* rax (syscall number) = func param 1 (rdi) */ | |
| mov rdi,rsi /* rdi (syscall param 1) = func param 2 (rsi) */ | |
| mov rsi,rdx /* rsi (syscall param 2) = func param 3 (rdx) */ | |
| mov rdx,rcx /* rdx (syscall param 3) = func param 4 (rcx) */ | |
| mov r10,r8 /* r10 (syscall param 4) = func param 5 (r8) */ | |
| mov r8,r9 /* r8 (syscall param 5) = func param 6 (r9) */ | |
| syscall /* enter the syscall (return value will be in rax */ | |
| ret /* return value is already in rax, we can return */ | |
| ------------------------------------------------------------------- | |
| How do we embed arbitrary asm into our program though? One way is | |
| gcc inline assembler, but I personally find the syntax ugly. | |
| We're going to write a .S file in GAS (GNU Assembler) syntax and | |
| let gcc compile and link it with your hello.c . | |
| ------------------------------------------------------------------- | |
| cat > hello.S << "EOF" | |
| /* enable intel asm syntax without the % prefix for registers */ | |
| .intel_syntax noprefix | |
| /* this marks the .text section of a PE executable, which contains | |
| program code */ | |
| .text | |
| /* exports syscall5 to other compilation units (files) */ | |
| .globl syscall5 | |
| syscall5: | |
| mov rax,rdi | |
| mov rdi,rsi | |
| mov rsi,rdx | |
| mov rdx,rcx | |
| mov r10,r8 | |
| mov r8,r9 | |
| syscall | |
| ret | |
| EOF | |
| ------------------------------------------------------------------- | |
| You can find syscalls numbers here: | |
| http://betteros.org/ref/syscall.php | |
| https://filippo.io/linux-syscall-table/ | |
| Or by simply letting the C preprocessor print it for you: | |
| ------------------------------------------------------------------- | |
| $ printf "#include <sys/syscall.h>\nblah SYS_write" | \ | |
| gcc -E - | grep blah | |
| blah 1 | |
| ------------------------------------------------------------------- | |
| -E runs the preprocessor on the file, expanding all macros and | |
| therefore replacing #define consts with their value, while - means | |
| that we use stdin as input (which we pipe in from printf). | |
| Then we just mark a line with blah so we can grep it, followed by | |
| the constant we want to know. | |
| Syscall numbers are usually named SYS_ followed by the syscall name | |
| You can also add the -m32 flags to check values for 32-bit (i386). | |
| Remember the prototype for write from earlier? | |
| ------------------------------------------------------------------- | |
| ssize_t write(int fd, const void *buf, size_t count); | |
| ------------------------------------------------------------------- | |
| ssize_t and size_t are types defined by unistd. A quick inspection | |
| reveals that they are 64-bit integers and that the extra s in | |
| ssize means signed: | |
| ------------------------------------------------------------------- | |
| $ printf "#include <unistd.h>" | gcc -E - | grep size_t | |
| typedef long int __blksize_t; | |
| typedef long int __ssize_t; | |
| typedef __ssize_t ssize_t; | |
| typedef long unsigned int size_t; | |
| ------------------------------------------------------------------- | |
| If we try -m32 we will also see that this will be a 32-bit integer | |
| on 32-bit, which means that it's the same size as the | |
| architecture's pointers. I like to call this kind of integer | |
| intptr. | |
| Now we can import syscall5 in hello.c and make a write function | |
| that calls it: | |
| ------------------------------------------------------------------- | |
| void* syscall5( | |
| void* number, | |
| void* arg1, | |
| void* arg2, | |
| void* arg3, | |
| void* arg4, | |
| void* arg5 | |
| ); | |
| typedef unsigned long int uintptr; /* size_t */ | |
| typedef long int intptr; /* ssize_t */ | |
| static | |
| intptr write(int fd, void const* data, uintptr nbytes) | |
| { | |
| return (intptr) | |
| syscall5( | |
| (void*)1, /* SYS_write */ | |
| (void*)(intptr)fd, | |
| (void*)data, | |
| (void*)nbytes, | |
| 0, /* ignored */ | |
| 0 /* ignored */ | |
| ); | |
| } | |
| int main(int argc, char* argv[]) | |
| { | |
| write(1, "hello\n", 6); | |
| return 0; | |
| } | |
| ------------------------------------------------------------------- | |
| See that (void*)(intptr) double cast on fd? If fd is 32-bit and | |
| void* is 64-bit, we would get a warning that we are implicitly | |
| casting it to a different size, so we need to explicitly specify | |
| that we want that conversion by adding the intptr cast. | |
| This should be done every time you cast to and from pointers when | |
| the destination type is not guaranteed to be the same size as | |
| pointers. Especially when targeting multiple architectures. | |
| Also note how we cast the const qualifier away from data to avoid | |
| a warning. | |
| If we compile now, we are finally only missing _start! | |
| ------------------------------------------------------------------- | |
| $ gcc -s -O2 -nostdlib hello.S hello.c | |
| /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../ | |
| x86_64-pc-linux-gnu/bin/ld: warning: cannot find entry symbol | |
| _start; defaulting to 0000000000400120 | |
| ------------------------------------------------------------------- | |
| So, how do we define _start? Where do we get argc and argv from? | |
| We need to know the initial state of registers and the stack. | |
| Back to the AMD64 ABI document. In figure 3.9, we can see the | |
| initial state of the stack: | |
| 0 to rsp: undefined | |
| rsp : argc <- top of the stack (last pushed value) | |
| rsp+8 : argv[0] | |
| rsp+16 : argv[1] | |
| rsp+24 : argv[2] | |
| ... : ... | |
| rsp+8*argc : argv[argc - 1] | |
| rsp+8+8*argc : 0 | |
| * more stuff we don't care about * | |
| And right below it we have the initial state of the registers: | |
| ------------------------------------------------------------------- | |
| %rbp: The content of this register is unspecified at process | |
| initialization time, but the user code should mark the | |
| deepest stack frame by setting the frame pointer to zero. | |
| %rsp: The stack pointer holds the address of the byte with lowest | |
| address which is part of the stack. It is guaranteed to be | |
| 16-byte aligned at process entry. | |
| %rdx: a function pointer that the application should register with | |
| atexit (BA_OS). | |
| ------------------------------------------------------------------- | |
| So we know that rbp must be zeroed and that rsp points to the top | |
| of the stack. We don't care about rdx. | |
| If you don't understand how the stack works, it's basically a | |
| chunk of memory where data is appended (pushed) or retrieved (pop) | |
| at the end. | |
| In AMD64's convention we're actually prepending and removing data | |
| at the beginning of the block of memory since the stack is said to | |
| "grow downwards", which means that when we push something on the | |
| stack, the stack pointer gets lower. | |
| Since the ABI states that the stack pointer is 16-byte aligned, we | |
| must remember always push data whose size is a multiple of 16. For | |
| example, 2 64-bit integers are 16 bytes. It's often necessary to | |
| either push useless data or simply align the stack pointer when | |
| the pushed values don't happen to be aligned. | |
| Putting it all together, our _start function needs to: | |
| - zero rbp | |
| - put argc into rdi (1st parameter for main) | |
| - put the stack address of argv[0] into rsi (2nd param for main), | |
| which will be interpreted as an array of char pointers. | |
| - align stack to 16-bytes | |
| - call main | |
| Here's our new hello.S: | |
| ------------------------------------------------------------------- | |
| .intel_syntax noprefix | |
| .text | |
| .globl _start, syscall5 | |
| _start: | |
| xor rbp,rbp /* xoring a value with itself = 0 */ | |
| pop rdi /* rdi = argc */ | |
| /* the pop instruction already added 8 to rsp */ | |
| mov rsi,rsp /* rest of the stack as an array of char ptr */ | |
| /* zero the las 4 bits of rsp, aligning it to 16 bytes | |
| same as "and rsp,0xfffffffffffffff0" because negative | |
| numbers are represented as | |
| max_unsigned_value + 1 - abs(negative_num) */ | |
| and rsp,-16 | |
| call main | |
| ret | |
| syscall5: | |
| mov rax,rdi | |
| mov rdi,rsi | |
| mov rsi,rdx | |
| mov rdx,rcx | |
| mov r10,r8 | |
| mov r8,r9 | |
| syscall | |
| ret | |
| ------------------------------------------------------------------- | |
| It finally compiles! It runs correctly, but we get a segmentation | |
| fault when we exit: | |
| ------------------------------------------------------------------- | |
| $ gcc -s -O2 -nostdlib hello.S hello.c | |
| $ ./a.out | |
| hello | |
| Segmentation fault | |
| ------------------------------------------------------------------- | |
| But why? | |
| When we execute a call instruction, the return address (address of | |
| the intruction to jump to after the function returns) is pushed | |
| onto the stack implicitly and the ret instruction implicitly pops | |
| it and jumps to it. | |
| The _start function is very special, as it has no return address, | |
| so our ret instruction in _start is trying to jump back to an | |
| invalid memory location, executing garbage data as code or | |
| triggering access violations. | |
| We need to tell the OS to kill our process and never reach the ret | |
| in _start. The syscall _EXIT(2) is just what we need: | |
| ------------------------------------------------------------------- | |
| $ man 2 _EXIT | |
| NAME | |
| _exit, _Exit - terminate the calling process | |
| SYNOPSIS | |
| #include <unistd.h> | |
| void _exit(int status); | |
| #include <stdlib.h> | |
| void _Exit(int status); | |
| $ printf "#include <sys/syscall.h>\nblah SYS_exit" | \ | |
| gcc -E - | grep blah | |
| blah 60 | |
| ------------------------------------------------------------------- | |
| The status code will simply be the return value of main, which is | |
| stored in rax as we know. | |
| New hello.S: | |
| ------------------------------------------------------------------- | |
| .intel_syntax noprefix | |
| .text | |
| .globl _start, syscall5 | |
| _start: | |
| xor rbp,rbp | |
| pop rdi | |
| mov rsi,rsp | |
| and rsp,-16 | |
| call main | |
| mov rdi,rax /* syscall param 1 = rax (ret value of main) */ | |
| mov rax,60 /* SYS_exit */ | |
| syscall | |
| ret /* should never be reached, but if the OS somehow fails | |
| to kill us, it will cause a segmentation fault */ | |
| syscall5: | |
| mov rax,rdi | |
| mov rdi,rsi | |
| mov rsi,rdx | |
| mov rdx,rcx | |
| mov r10,r8 | |
| mov r8,r9 | |
| syscall | |
| ret | |
| ------------------------------------------------------------------- | |
| Our program finally runs and terminates correctly! Let's give | |
| ourselves a good pat on the back. | |
| ------------------------------------------------------------------- | |
| $ gcc -s -O2 -nostdlib hello.S hello.c | |
| $ ./a.out | |
| hello | |
| ------------------------------------------------------------------- | |
| Let's check the executable size now: | |
| ------------------------------------------------------------------- | |
| $ wc -c a.out | |
| 1008 a.out | |
| ------------------------------------------------------------------- | |
| We're almost below 1kb and it's 6 times smaller than before, but we | |
| can shrink it further. | |
| First of all, gcc generates unwind tables by default, which are | |
| used for exception handling and other stuff we don't care about. | |
| Let's turn those off: | |
| ------------------------------------------------------------------- | |
| $ gcc -s -O2 \ | |
| -nostdlib \ | |
| -fno-unwind-tables \ | |
| -fno-asynchronous-unwind-tables \ | |
| hello.S hello.c | |
| $ wc -c a.out | |
| 736 a.out | |
| ------------------------------------------------------------------- | |
| Woah, we shaved almost 300 bytes off! | |
| As a last step, we can check the executable for useless sections: | |
| ------------------------------------------------------------------- | |
| $ objdump -x a.out | |
| a.out: file format elf64-x86-64 | |
| a.out | |
| architecture: i386:x86-64, flags 0x00000102: | |
| EXEC_P, D_PAGED | |
| start address 0x000000000040011a | |
| Program Header: | |
| LOAD off 0x0000000000000000 vaddr 0x0000000000400000 | |
| paddr 0x0000000000400000 align 2**21 | |
| filesz 0x0000000000000153 memsz 0x0000000000000153 | |
| flags r-x | |
| STACK off 0x0000000000000000 vaddr 0x0000000000000000 | |
| paddr 0x0000000000000000 align 2**4 | |
| filesz 0x0000000000000000 memsz 0x0000000000000000 | |
| flags rwx | |
| PAX_FLAGS off 0x0000000000000000 vaddr 0x0000000000000000 | |
| paddr 0x0000000000000000 align 2**3 | |
| filesz 0x0000000000000000 memsz 0x0000000000000000 | |
| flags --- 2800 | |
| Sections: | |
| Idx Name Size VMA LMA ... | |
| 0 .text 0000005c 00000000004000f0 00000000004000f0 ... | |
| CONTENTS, ALLOC, LOAD, READONLY, CODE | |
| 1 .rodata 00000007 000000000040014c 000000000040014c ... | |
| CONTENTS, ALLOC, LOAD, READONLY, DATA | |
| 2 .comment 0000002a 0000000000000000 0000000000000000 ... | |
| CONTENTS, READONLY | |
| SYMBOL TABLE: | |
| no symbols | |
| ------------------------------------------------------------------- | |
| .text is the code | |
| .rodata is Read Only data (such as the string "hello" in our case) | |
| So we need both of these. | |
| But what's that .comment section? | |
| ------------------------------------------------------------------- | |
| $ objdump -s -j .comment a.out | |
| a.out: file format elf64-x86-64 | |
| Contents of section .comment: | |
| 0000 4743433a 20284765 6e746f6f 20342e39 GCC: (Gentoo 4.9 | |
| 0010 2e332070 312e352c 20706965 2d302e36 .3 p1.5, pie-0.6 | |
| 0020 2e342920 342e392e 3300 .4) 4.9.3. | |
| ------------------------------------------------------------------- | |
| Just information about the compiler, it seems. That's 1 byte for | |
| every character of that string, let's get rid of it! | |
| ------------------------------------------------------------------- | |
| $ strip -R .comment a.out | |
| $ wc -c a.out | |
| 624 a.out | |
| ------------------------------------------------------------------- | |
| There we go, we have achieved a nearly ten-fold size improvement | |
| on our little hello world. | |
| Let's set up a build script with all those compiler flags and let's | |
| also make it output the executable with a proper name. | |
| Also, I'm going to add the following useful flags: | |
| -Wl,--gc-sections: get rid of any unused code sections | |
| -fdata-sections: separate each function into its own code section. | |
| this lets gc-sections do its job. these two | |
| options combined will get rid of any dead code you | |
| might accidentally leave in your program. it also | |
| gets rid of unused functions in statically linked | |
| libraries. | |
| -fno-stack-protector: doesn't generate extra code to guard against | |
| overflows overwriting the return address. | |
| -Wa,--noexecstack: mark the stack memory as non-executable. this is | |
| just extra security since we don't need to be | |
| executing code off the stack's memory. | |
| -fno-builtin: disable all builtin gcc functions (such as math | |
| routines and other stuff). we will implement them | |
| ourselves as needed. | |
| -std=c89 -pedantic: follow the old c89 standard strictly. this | |
| should force us to write code more compatible | |
| with old compilers. | |
| -Wall: enable all warnings. | |
| -Werror: treat all warnings as error. can't let our code build with | |
| unchecked warnings. | |
| ------------------------------------------------------------------- | |
| $ cat > build.sh << "EOF" | |
| #!/bin/sh | |
| exename="hello" | |
| gcc -std=c89 -pedantic -s -O2 -Wall -Werror \ | |
| -nostdlib \ | |
| -fno-unwind-tables \ | |
| -fno-asynchronous-unwind-tables \ | |
| -fdata-sections \ | |
| -Wl,--gc-sections \ | |
| -Wa,--noexecstack \ | |
| -fno-builtin \ | |
| -fno-stack-protector \ | |
| hello.S hello.c \ | |
| -o $exename \ | |
| \ | |
| && strip -R .comment $exename | |
| EOF | |
| $ chmod +x ./build.sh | |
| $ ./build.sh | |
| $ wc -c hello | |
| 624 hello | |
| $ ./hello | |
| hello | |
| ------------------------------------------------------------------- | |
| As you might have noticed, we are doing a lot of useless mov's in | |
| that syscall5 wrapper on syscalls that take less than 5 parameters. | |
| Let's make one wrapper for each parameter count. This will increase | |
| performance slightly at the cost of a slightly bigger executable. | |
| You are free to remove the ones you don't use once you finish | |
| prototyping your program. | |
| New hello.S | |
| ------------------------------------------------------------------- | |
| .intel_syntax noprefix | |
| .text | |
| .globl _start, syscall, | |
| .globl syscall1, syscall2, syscall3, syscall4, syscall5 | |
| _start: | |
| xor rbp,rbp | |
| pop rdi | |
| mov rsi,rsp | |
| and rsp,-16 | |
| call main | |
| mov rdi,rax | |
| mov rax,60 /* SYS_exit */ | |
| syscall | |
| ret | |
| syscall: | |
| mov rax,rdi | |
| syscall | |
| ret | |
| syscall1: | |
| mov rax,rdi | |
| mov rdi,rsi | |
| syscall | |
| ret | |
| syscall2: | |
| mov rax,rdi | |
| mov rdi,rsi | |
| mov rsi,rdx | |
| syscall | |
| ret | |
| syscall3: | |
| mov rax,rdi | |
| mov rdi,rsi | |
| mov rsi,rdx | |
| mov rdx,rcx | |
| syscall | |
| ret | |
| syscall4: | |
| mov rax,rdi | |
| mov rdi,rsi | |
| mov rsi,rdx | |
| mov rdx,rcx | |
| mov r10,r8 | |
| syscall | |
| ret | |
| syscall5: | |
| mov rax,rdi | |
| mov rdi,rsi | |
| mov rsi,rdx | |
| mov rdx,rcx | |
| mov r10,r8 | |
| mov r8,r9 | |
| syscall | |
| ret | |
| ------------------------------------------------------------------- | |
| Now we can change our write function to use syscall3 instead. | |
| We will also change argv in our main to be char const* since we | |
| probably won't be modifying it. This is normally not allowed on the | |
| standard C library, but we aren't using it :^). | |
| Using the syscall numbers directly is a bit hard to read so let's | |
| also make a header with all the syscall numbers we use: | |
| ------------------------------------------------------------------- | |
| $ cat > syscalls.h << "EOF" | |
| #define SYS_write 1 | |
| #define SYS_exit 60 | |
| EOF | |
| ------------------------------------------------------------------- | |
| We will also define the syscall number as uintptr so that we don't | |
| need to cast to void*. | |
| new hello.c | |
| ------------------------------------------------------------------- | |
| #include "syscalls.h" | |
| typedef unsigned long int uintptr; | |
| typedef long int intptr; | |
| void* syscall3( | |
| uintptr number, | |
| void* arg1, | |
| void* arg2, | |
| void* arg3 | |
| ); | |
| static | |
| intptr write(int fd, void const* data, uintptr nbytes) | |
| { | |
| return (uintptr) | |
| syscall3( | |
| SYS_write, | |
| (void*)(intptr)fd, | |
| (void*)data, | |
| (void*)nbytes | |
| ); | |
| } | |
| int main(int argc, char const* argv[]) | |
| { | |
| write(1, "hello\n", 6); | |
| return 0; | |
| } | |
| ------------------------------------------------------------------- | |
| We can include headers in .S files, so let's also include it in | |
| hello.S | |
| ------------------------------------------------------------------- | |
| #include "syscalls.h" | |
| .intel_syntax noprefix | |
| .text | |
| .globl _start, syscall, | |
| .globl syscall1, syscall2, syscall3, syscall4, syscall5 | |
| _start: | |
| xor rbp,rbp | |
| pop rdi | |
| mov rsi,rsp | |
| and rsp,-16 | |
| call main | |
| mov rdi,rax | |
| mov rax,SYS_exit | |
| syscall | |
| ret | |
| ... | |
| ------------------------------------------------------------------- | |
| Having to pass the string length every time is annoying, so let's | |
| implement our own strlen and puts. | |
| I'm also going to make a "internal" alias for static, which makes | |
| it easier to search for static functions, rather than static | |
| variables, in a large codebase. I got this idea from Casey Muratori | |
| from handmade hero. | |
| ------------------------------------------------------------------- | |
| #include "syscalls.h" | |
| typedef unsigned long int uintptr; | |
| typedef long int intptr; | |
| #define internal static | |
| void* syscall3( | |
| uintptr number, | |
| void* arg1, | |
| void* arg2, | |
| void* arg3 | |
| ); | |
| /* ------------------------------------------------------------- */ | |
| #define stdout 1 | |
| internal | |
| intptr write(int fd, void const* data, uintptr nbytes) | |
| { | |
| return (uintptr) | |
| syscall3( | |
| SYS_write, | |
| (void*)(intptr)fd, | |
| (void*)data, | |
| (void*)nbytes | |
| ); | |
| } | |
| /* ------------------------------------------------------------- */ | |
| internal | |
| uintptr strlen(char const* str) | |
| { | |
| char const* p; | |
| for (p = str; *p; ++p); | |
| return p - str; | |
| } | |
| internal | |
| uintptr puts(char const* str) { | |
| return write(stdout, str, strlen(str)); | |
| } | |
| /* ------------------------------------------------------------- */ | |
| int main(int argc, char const* argv[]) | |
| { | |
| puts("hello\n"); | |
| return 0; | |
| } | |
| ------------------------------------------------------------------- | |
| If you don't understand my strlen function, it's pretty simple: C | |
| strings are null-terminated (the byte after the last character is | |
| zero), so I just iterate the characters through a pointer until | |
| I find a zero byte, and then I subtract the current position from | |
| the beginning of the string. | |
| libc does all kinds of crazy tricks to optimize this for large | |
| strings, which I haven't looked into. | |
| As you can see, I've also separated the code into sections with | |
| those spacer comments for readability. I grouped all the syscall | |
| wrappers together, followed by utility functions, followed by | |
| the program's code. | |
| Now we have a nice framework for AMD64 programs, but we're not | |
| going to stop here. We're going to set this up to also cross | |
| compile for i386, which is a very common architecture in low-end | |
| servers (such as the one I host my gopher mirror on). | |
| ################################################################### | |
| Porting to i386 | |
| ################################################################### | |
| Let's move all the AMD64-specific code into a dedicated folder. | |
| ------------------------------------------------------------------- | |
| $ mkdir amd64 | |
| $ mv hello.S amd64/start.S | |
| $ mv syscalls.h amd64/ | |
| ------------------------------------------------------------------- | |
| Now we can make a architecture-specific main.c where we define the | |
| integer types and main, which just calls hello_run, or whatever you | |
| want to name your program's entry point. This file includes hello.c | |
| just before main. | |
| I also make it define AMD64 in case we need to do platform checking | |
| in the code. Platform specific code should be kept separated | |
| whenever possible, though. | |
| ------------------------------------------------------------------- | |
| $ cat > amd64/main.c << "EOF" | |
| #define AMD64 | |
| #include "syscalls.h" | |
| typedef unsigned long int u64; | |
| typedef unsigned int u32; | |
| typedef unsigned short int u16; | |
| typedef unsigned char u8; | |
| typedef long int i64; | |
| typedef int i32; | |
| typedef short int i16; | |
| typedef signed char i8; | |
| typedef i64 intptr; | |
| typedef u64 uintptr; | |
| #include "../hello.c" | |
| int main(int argc, char const* argv[]) { | |
| return hello_run(argc, argv); | |
| } | |
| EOF | |
| ------------------------------------------------------------------- | |
| Yes, you can include .c files, which just get pasted into the file. | |
| This results in a single compilation unit even though we have | |
| multiple files, which speeds up compilation (unless your project is | |
| massive) and saves us the pain of typing every filename in our | |
| build script. This is yet another tick I got from Casey. | |
| By the way, you can check integer types on any architecture with | |
| the usual gcc preprocessor trick: | |
| ------------------------------------------------------------------- | |
| $ printf "#include <stdint.h>" | gcc -E - | grep int64 | |
| typedef long int int64_t; | |
| typedef unsigned long int uint64_t; | |
| $ printf "#include <stdint.h>" | gcc -E - | grep int32 | |
| typedef int int32_t; | |
| typedef unsigned int uint32_t; | |
| $ printf "#include <stdint.h>" | gcc -E - | grep int16 | |
| typedef short int int16_t; | |
| typedef unsigned short int uint16_t; | |
| $ printf "#include <stdint.h>" | gcc -E - | grep int8 | |
| typedef signed char int8_t; | |
| typedef unsigned char uint8_t; | |
| ------------------------------------------------------------------- | |
| And for the size of pointers, you can write a simple program that | |
| printfs sizeof(void*). | |
| hello.c will now look like this (remember, we moved the integer | |
| definitions to main.c and renamed main to hello_run, and | |
| syscalls.h is already included in main.c): | |
| ------------------------------------------------------------------- | |
| #define internal static | |
| void* syscall3( | |
| uintptr number, | |
| void* arg1, | |
| void* arg2, | |
| void* arg3 | |
| ); | |
| /* ------------------------------------------------------------- */ | |
| #define stdout 1 | |
| internal | |
| intptr write(int fd, void const* data, uintptr nbytes) | |
| { | |
| return (uintptr) | |
| syscall3( | |
| SYS_write, | |
| (void*)(intptr)fd, | |
| (void*)data, | |
| (void*)nbytes | |
| ); | |
| } | |
| /* ------------------------------------------------------------- */ | |
| internal | |
| uintptr strlen(char const* str) | |
| { | |
| char const* p; | |
| for (p = str; *p; ++p); | |
| return p - str; | |
| } | |
| internal | |
| uintptr puts(char const* str) { | |
| return write(stdout, str, strlen(str)); | |
| } | |
| /* ------------------------------------------------------------- */ | |
| internal | |
| int hello_run(int argc, char const* argv[]) | |
| { | |
| puts("hello\n"); | |
| return 0; | |
| } | |
| ------------------------------------------------------------------- | |
| Modify the build script to follow the new structure: | |
| ------------------------------------------------------------------- | |
| #!/bin/sh | |
| exename="hello" | |
| gcc -std=c89 -pedantic -s -O2 -Wall -Werror \ | |
| -nostdlib \ | |
| -fno-unwind-tables \ | |
| -fno-asynchronous-unwind-tables \ | |
| -fdata-sections \ | |
| -Wl,--gc-sections \ | |
| -Wa,--noexecstack \ | |
| -fno-builtin \ | |
| -fno-stack-protector \ | |
| amd64/start.S amd64/main.c \ | |
| -o $exename \ | |
| \ | |
| && strip -R .comment $exename | |
| ------------------------------------------------------------------- | |
| Now we can create the main.c for i386: | |
| ------------------------------------------------------------------- | |
| $ mkdir i386 | |
| $ cat > i386/main.c << "EOF" | |
| #define I386 | |
| #include "syscalls.h" | |
| typedef unsigned long long int u64; | |
| typedef unsigned int u32; | |
| typedef unsigned short int u16; | |
| typedef unsigned char u8; | |
| typedef long long int i64; | |
| typedef int i32; | |
| typedef short int i16; | |
| typedef signed char i8; | |
| typedef i32 intptr; | |
| typedef u32 uintptr; | |
| #include "../hello.c" | |
| int main(int argc, char const* argv[]) { | |
| return hello_run(argc, argv); | |
| } | |
| EOF | |
| ------------------------------------------------------------------- | |
| Note how intptr is defined as a 32-bit integer and u64 is long | |
| long on 32-bits. | |
| Let's now grab syscall numbers for i386 and throw them into | |
| syscalls.h: | |
| ------------------------------------------------------------------- | |
| $ printf "#include <sys/syscall.h>\nblah SYS_write" \ | |
| | gcc -m32 -E - | grep blah | |
| blah 4 | |
| $ printf "#include <sys/syscall.h>\nblah SYS_exit" \ | |
| | gcc -m32 -E - | grep blah | |
| blah 1 | |
| $ cat > i386/syscalls.h << "EOF" | |
| #define SYS_write 4 | |
| #define SYS_exit 1 | |
| EOF | |
| ------------------------------------------------------------------- | |
| We need to write a i386 start.S and you guessed it, it's time to | |
| look at the ABI specification once again! | |
| http://www.sco.com/developers/devspecs/abi386-4.pdf | |
| This time I will just summarize the differences from amd64: | |
| - Registers are 32-bit so we push 4 bytes at a time. | |
| - The stack is aligned to 4 bytes, but we will still align it to | |
| 16 bytes because it can improve performance by preventing | |
| misaligned SSE accesses (according to glibc). | |
| - ebp needs to be zeroed (32-bit version of rbp) | |
| - esp is the stack pointer (32-bit version of rsp) | |
| - Return values for functions and syscalls are in eax | |
| - The instruction to enter syscalls is "int 0x80" | |
| - Syscall parameters are passed in ebx, ecx, edx, esi, edi, ebp | |
| - Function parameters are passed entirely through the stack by | |
| pushing them in reverse order, which means that we will be able | |
| to access them sequentially every 4 bytes on the stack. | |
| VERY IMPORTANT DIFFERENCE. We won't be using registers to pass | |
| parameters to main anymore nor to pull parameters in syscall | |
| wrappers. | |
| - Functions are expected to preserve ebx, esi, edi, ebp, esp on | |
| their own VERY IMPORTANT! we will have to save and restore these | |
| registers manually in our syscall wrappers! | |
| - Function callers are expected to clean up the parameters off the | |
| stack after the call. VERY IMPORTANT | |
| - As explained earlier, the return address is implicitly pushed on | |
| the stack so the function parameters will start at esp+4. | |
| In short, our _start will look something like this: | |
| ------------------------------------------------------------------- | |
| xor ebp,ebp | |
| pop esi /* argc */ | |
| mov ecx,esp /* argv */ | |
| /* 16-byte stack alignment is not mandatory here but | |
| according to glibc it improves SSE performance */ | |
| and esp,-16 | |
| /* push garbage to align to 16 bytes */ | |
| push 0xb16b00b5 | |
| push 0xb16b00b5 | |
| push ecx /* argv */ | |
| push esi /* argc */ | |
| call main | |
| add esp,16 | |
| /* on i386 it's up to the caller to clean up the stack. we can | |
| either pop them into scratch registers or just add the total | |
| size of the parameters in bytes to the stack pointer */ | |
| mov ebx,eax | |
| mov eax,SYS_exit | |
| int 0x80 | |
| ret | |
| ------------------------------------------------------------------- | |
| ... and our syscall5 wrapper will look like this: | |
| ------------------------------------------------------------------- | |
| push ebx | |
| push esi | |
| push edi | |
| mov eax,[esp+4+12] | |
| mov ebx,[esp+8+12] | |
| mov ecx,[esp+12+12] | |
| mov edx,[esp+16+12] | |
| mov esi,[esp+20+12] | |
| mov edi,[esp+24+12] | |
| int 0x80 | |
| pop edi | |
| pop esi | |
| pop ebx | |
| ret | |
| ------------------------------------------------------------------- | |
| See how I'm pushing registers on the stack to preserve them to then | |
| pop them (in reverse order since it's LIFO)? That's very important | |
| on i386. | |
| Also, you might be wondering what's going on with the esp offsets. | |
| You have to keep in mind that every time I push a register on the | |
| stack, esp is decremented by 4, so I have to skip the registers I | |
| pushed on the stack (3 registers = 12 bytes) to get to the | |
| parameters. Don't forget that the return address is also on the | |
| stack, so parameters start at + 4. | |
| And here's our complete i386 start.S | |
| ------------------------------------------------------------------- | |
| $ cat > i386/start.S << "EOF" | |
| #include "syscalls.h" | |
| .intel_syntax noprefix | |
| .text | |
| .globl _start, syscall | |
| .globl syscall1, syscall2, syscall3, syscall4, syscall5 | |
| _start: | |
| xor ebp,ebp | |
| pop esi | |
| mov ecx,esp | |
| and esp,-16 | |
| push 0xb1gb00b5 | |
| push 0xb1gb00b5 | |
| push ecx | |
| push esi | |
| call main | |
| add esp,16 | |
| mov ebx,eax | |
| mov eax,SYS_exit | |
| int 0x80 | |
| ret | |
| syscall: | |
| mov eax,[esp+4] | |
| int 0x80 | |
| ret | |
| syscall1: | |
| push ebx | |
| mov eax,[esp+4+4] | |
| mov ebx,[esp+8+4] | |
| int 0x80 | |
| pop ebx | |
| ret | |
| syscall2: | |
| push ebx | |
| mov eax,[esp+4+4] | |
| mov ebx,[esp+8+4] | |
| mov ecx,[esp+12+4] | |
| int 0x80 | |
| pop ebx | |
| ret | |
| syscall3: | |
| push ebx | |
| mov eax,[esp+4+4] | |
| mov ebx,[esp+8+4] | |
| mov ecx,[esp+12+4] | |
| mov edx,[esp+16+4] | |
| int 0x80 | |
| pop ebx | |
| ret | |
| syscall4: | |
| push ebx | |
| push esi | |
| mov eax,[esp+4+8] | |
| mov ebx,[esp+8+8] | |
| mov ecx,[esp+12+8] | |
| mov edx,[esp+16+8] | |
| mov esi,[esp+20+8] | |
| int 0x80 | |
| pop esi | |
| pop ebx | |
| ret | |
| syscall5: | |
| push ebx | |
| push esi | |
| push edi | |
| mov eax,[esp+4+12] | |
| mov ebx,[esp+8+12] | |
| mov ecx,[esp+12+12] | |
| mov edx,[esp+16+12] | |
| mov esi,[esp+20+12] | |
| mov edi,[esp+24+12] | |
| int 0x80 | |
| pop edi | |
| pop esi | |
| pop ebx | |
| ret | |
| EOF | |
| ------------------------------------------------------------------- | |
| Now we need to modify our build script to handle multiple | |
| architectures. | |
| I will just make the script take the arch subfolder name as a | |
| parameter. | |
| This is not enough though, because each architecture will have some | |
| extra compiler flags. For example, on i386 we need -m32 to ensure a | |
| 32-bit build even on amd64 dev machines, as well as -Wno-long-long | |
| which suppresses a warning about 64 bit integers being a | |
| nonstandard gcc extension on 32-bit. | |
| We will make our build script source a flags.sh script in the | |
| architecture-specific folder which just exports COMPILER_FLAGS with | |
| all the extra stuff it wants. | |
| ------------------------------------------------------------------- | |
| $ cat > build.sh << "EOF" | |
| #!/bin/sh | |
| exename="hello" | |
| archname=${1:-amd64} # if not specified, default to amd64 | |
| # if flags.sh exists in the arch folder, source it | |
| if [ -e $archname/flags.sh ]; then | |
| source $archname/flags.sh | |
| fi | |
| gcc -std=c89 -pedantic -s -O2 -Wall -Werror \ | |
| -nostdlib \ | |
| -fno-unwind-tables \ | |
| -fno-asynchronous-unwind-tables \ | |
| -fdata-sections \ | |
| -Wl,--gc-sections \ | |
| -Wa,--noexecstack \ | |
| -fno-builtin \ | |
| -fno-stack-protector \ | |
| $COMPILER_FLAGS \ | |
| $archname/start.S $archname/main.c \ | |
| -o $exename \ | |
| \ | |
| && strip -R .comment $exename | |
| EOF | |
| $ cat > i386/flags.sh << "EOF" | |
| #!/bin/sh | |
| export COMPILER_FLAGS="-m32 -Wno-long-long" | |
| EOF | |
| ------------------------------------------------------------------- | |
| Now we can compile both architectures easily with minimal code | |
| redundancy: | |
| ------------------------------------------------------------------- | |
| $ wc -c hello | |
| 720 hello | |
| $ ./hello | |
| hello | |
| $ ./build.sh i386 | |
| $ wc -c hello | |
| 608 hello | |
| $ ./hello | |
| hello | |
| ------------------------------------------------------------------- | |
| And there you have it! You now have a nice framework to develop | |
| libc-free programs. | |
| As you can see, the 32-bit executable is slightly smaller. This is | |
| mostly because pointers are half as large compared to 64-bit. | |
| ################################################################### | |
| Legacy syscalls on i386 | |
| ################################################################### | |
| There are a few things you should be extremely careful with when | |
| dealing with syscalls, especially when targeting multiple | |
| architectures. | |
| Some syscalls, such as stat, might return their stuff in a struct. | |
| Be extremely careful to check the struct layout and size of the | |
| types used, because it will often change drastically between | |
| architectures. | |
| ------------------------------------------------------------------- | |
| $ man 2 stat | |
| NAME | |
| stat, fstat, lstat, fstatat - get file status | |
| SYNOPSIS | |
| #include <sys/types.h> | |
| #include <sys/stat.h> | |
| #include <unistd.h> | |
| int stat(const char *pathname, struct stat *buf); | |
| int fstat(int fd, struct stat *buf); | |
| int lstat(const char *pathname, struct stat *buf); | |
| $ printf "#include <sys/stat.h>" | gcc -E - | grep -A 1 "int stat" | |
| extern int stat (const char *__restrict __file, | |
| struct stat *__restrict __buf) __attribute__ ((__nothrow__ , | |
| __leaf__)) __attribute__ ((__nonnull__ (1, 2))); | |
| ------------------------------------------------------------------- | |
| ------------------------------------------------------------------- | |
| $ printf "#include <sys/stat.h>" \ | |
| | gcc -E - | grep -A 60 "struct stat" | |
| struct stat | |
| { | |
| __dev_t st_dev; | |
| __ino_t st_ino; | |
| __nlink_t st_nlink; | |
| __mode_t st_mode; | |
| __uid_t st_uid; | |
| __gid_t st_gid; | |
| int __pad0; | |
| __dev_t st_rdev; | |
| __off_t st_size; | |
| __blksize_t st_blksize; | |
| __blkcnt_t st_blocks; | |
| # 91 "/usr/include/bits/stat.h" 3 4 | |
| struct timespec st_atim; | |
| struct timespec st_mtim; | |
| struct timespec st_ctim; | |
| # 106 "/usr/include/bits/stat.h" 3 4 | |
| __syscall_slong_t __glibc_reserved[3]; | |
| # 115 "/usr/include/bits/stat.h" 3 4 | |
| }; | |
| $ printf "#include <sys/stat.h>" | gcc -E - \ | |
| | grep '__dev_t\|__ino_t\|__nlink_t\|__mode_t\|__uid_t\|__gid_t' | |
| typedef unsigned long int __dev_t; | |
| typedef unsigned int __uid_t; | |
| typedef unsigned int __gid_t; | |
| typedef unsigned long int __ino_t; | |
| typedef unsigned int __mode_t; | |
| typedef unsigned long int __nlink_t; | |
| $ printf "#include <sys/stat.h>" | gcc -E - \ | |
| | grep '__blksize_t\|__blkcnt_t\|__syscall_slong_t\|__off_t' | |
| typedef long int __off_t; | |
| typedef long int __blksize_t; | |
| typedef long int __blkcnt_t; | |
| typedef long int __syscall_slong_t; | |
| $ printf "#include <sys/stat.h>" | gcc -E - \ | |
| | grep -A 10 "struct timespec" | |
| struct timespec | |
| { | |
| __time_t tv_sec; | |
| __syscall_slong_t tv_nsec; | |
| }; | |
| $ printf "#include <sys/stat.h>" | gcc -E - | grep "__time_t" | |
| typedef long int __time_t; | |
| ------------------------------------------------------------------- | |
| ------------------------------------------------------------------- | |
| $ printf "#include <sys/stat.h>" \ | |
| | gcc -m32 -E - | grep -A 60 "struct stat" | |
| struct stat | |
| { | |
| __dev_t st_dev; | |
| unsigned short int __pad1; | |
| __ino_t st_ino; | |
| __mode_t st_mode; | |
| __nlink_t st_nlink; | |
| __uid_t st_uid; | |
| __gid_t st_gid; | |
| __dev_t st_rdev; | |
| unsigned short int __pad2; | |
| __off_t st_size; | |
| __blksize_t st_blksize; | |
| __blkcnt_t st_blocks; | |
| # 91 "/usr/include/bits/stat.h" 3 4 | |
| struct timespec st_atim; | |
| struct timespec st_mtim; | |
| struct timespec st_ctim; | |
| # 109 "/usr/include/bits/stat.h" 3 4 | |
| unsigned long int __glibc_reserved4; | |
| unsigned long int __glibc_reserved5; | |
| }; | |
| $ printf "#include <sys/stat.h>" | gcc -m32 -E - \ | |
| | grep '__dev_t\|__ino_t\|__nlink_t\|__mode_t\|__uid_t\|__gid_t' | |
| __extension__ typedef __u_quad_t __dev_t; | |
| __extension__ typedef unsigned int __uid_t; | |
| __extension__ typedef unsigned int __gid_t; | |
| __extension__ typedef unsigned long int __ino_t; | |
| __extension__ typedef unsigned int __mode_t; | |
| __extension__ typedef unsigned int __nlink_t; | |
| $ printf "#include <sys/stat.h>" \ | |
| | gcc -m32 -E - | grep '__u_quad_t' | |
| __extension__ typedef unsigned long long int __u_quad_t; | |
| $ printf "#include <sys/stat.h>" | gcc -m32 -E - \ | |
| | grep '__blksize_t\|__blkcnt_t\|__syscall_slong_t' | |
| __extension__ typedef long int __off_t; | |
| __extension__ typedef long int __blksize_t; | |
| __extension__ typedef long int __blkcnt_t; | |
| __extension__ typedef long int __syscall_slong_t; | |
| $ printf "#include <sys/stat.h>" | gcc -m32 -E - \ | |
| | grep -A 10 "struct timespec" | |
| struct timespec | |
| { | |
| __time_t tv_sec; | |
| __syscall_slong_t tv_nsec; | |
| }; | |
| $ printf "#include <sys/stat.h>" | gcc -m32 -E - | grep "__time_t" | |
| __extension__ typedef long int __time_t; | |
| ------------------------------------------------------------------- | |
| As you can see, the stat struct is substantially different for | |
| i386 and amd64 and the contained types are also different in size. | |
| This is not all there is to it though. Some syscalls have multiple | |
| versions of them with different structs for historical reasons, and | |
| gcc might wrap them in some weird way, using its own struct. | |
| stat is one of them. Suppose you use the above structs and assume | |
| libc, stat struct is right. | |
| Let's make a simple program that stats a file and dumps the stat | |
| struct to stdout for us to inspect. | |
| These are the files: | |
| ------------------------------------------------------------------- | |
| $ cat amd64/syscalls.h | |
| #define SYS_write 1 | |
| #define SYS_stat 4 | |
| #define SYS_exit 60 | |
| $ cat i386/syscalls.h | |
| #define SYS_write 4 | |
| #define SYS_stat 106 | |
| #define SYS_exit 1 | |
| $ cat stat.c | |
| #define internal static | |
| void* syscall2( | |
| uintptr number, | |
| void* arg1, | |
| void* arg2 | |
| ); | |
| void* syscall3( | |
| uintptr number, | |
| void* arg1, | |
| void* arg2, | |
| void* arg3 | |
| ); | |
| /* ------------------------------------------------------------- */ | |
| #define stdout 1 | |
| internal | |
| intptr write(int fd, void const* data, uintptr nbytes) | |
| { | |
| return (uintptr) | |
| syscall3( | |
| SYS_write, | |
| (void*)(intptr)fd, | |
| (void*)data, | |
| (void*)nbytes | |
| ); | |
| } | |
| typedef u64 dev_t; | |
| typedef intptr syscall_slong_t; | |
| typedef intptr time_t; | |
| typedef struct | |
| { | |
| time_t sec; | |
| syscall_slong_t nsec; | |
| } | |
| timespec; | |
| typedef struct | |
| { | |
| dev_t dev; | |
| #ifdef I386 | |
| u16 __pad1; | |
| #endif | |
| uintptr ino; | |
| uintptr nlink; | |
| u32 mode; | |
| u32 uid; | |
| u32 gid; | |
| #ifdef AMD64 | |
| int __pad0; | |
| #endif | |
| dev_t rdev; | |
| #ifdef I386 | |
| u16 __pad2; | |
| #endif | |
| intptr size; | |
| intptr blksize; | |
| intptr blocks; | |
| timespec atim; | |
| timespec mtim; | |
| timespec ctim; | |
| #ifdef AMD64 | |
| syscall_slong_t __glibc_reserved[3]; | |
| #else | |
| u32 __glibc_reserved4; | |
| u32 __glibc_reserved5; | |
| #endif | |
| } | |
| stat_info; | |
| internal | |
| int stat(char const* path, stat_info* s) | |
| { | |
| return (int)(intptr) | |
| syscall2( | |
| SYS_stat, | |
| (void*)path, | |
| s | |
| ); | |
| } | |
| /* ------------------------------------------------------------- */ | |
| internal | |
| int stat_run(int argc, char const* argv[]) | |
| { | |
| stat_info si; | |
| if (stat("/etc/hosts", &si) == 0) { | |
| write(stdout, &si, sizeof(stat_info)); | |
| } | |
| return 0; | |
| } | |
| ------------------------------------------------------------------- | |
| Now if we hexdump output from amd64 and i386, we will see that | |
| something is not quite right on i386: | |
| ------------------------------------------------------------------- | |
| $ ./build.sh | |
| $ ./stat | hexdump -C | |
| 00000000 12 08 00 00 00 00 00 00 50 59 0a 00 00 00 00 00 | |
| 00000010 01 00 00 00 00 00 00 00 a4 81 00 00 00 00 00 00 | |
| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
| 00000030 bc 04 00 00 00 00 00 00 00 10 00 00 00 00 00 00 | |
| 00000040 08 00 00 00 00 00 00 00 24 b2 e9 57 00 00 00 00 | |
| 00000050 d1 f4 e1 2f 00 00 00 00 e8 d8 5e 57 00 00 00 00 | |
| 00000060 a0 3a b4 24 00 00 00 00 e8 d8 5e 57 00 00 00 00 | |
| 00000070 20 c8 0f 25 00 00 00 00 00 00 00 00 00 00 00 00 | |
| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
| 00000090 | |
| $ ./build.sh i386 | |
| $ ./stat | hexdump -C | |
| 00000000 12 08 00 00 50 59 0a 00 a4 81 01 00 00 00 00 00 | |
| 00000010 00 00 00 00 bc 04 00 00 00 10 00 00 08 00 00 00 | |
| 00000020 24 b2 e9 57 d1 f4 e1 2f e8 d8 5e 57 a0 3a b4 24 | |
| 00000030 e8 d8 5e 57 20 c8 0f 25 00 00 00 00 00 00 00 00 | |
| 00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
| 00000050 00 00 00 00 00 00 00 00 | |
| 00000058 | |
| ------------------------------------------------------------------- | |
| We know dev_t is a 64-bit integer from our previous investigations, | |
| so why is other stuff being packed after the 4th byte? The first | |
| 8 bytes of the structs should be the same as amd64! | |
| If you scroll through the stat manpage, you will find this: | |
| ------------------------------------------------------------------- | |
| Over time, increases in the size of the stat structure have led to | |
| three successive versions of stat(): sys_stat() (slot __NR_old? | |
| stat), sys_newstat() (slot __NR_stat), and sys_stat64() (slot | |
| __NR_stat64) on 32-bit platforms such as i386. The first two ver? | |
| sions were already present in Linux 1.0 (albeit with different | |
| names); the last was added in Linux 2.4. Similar remarks apply for | |
| fstat() and lstat(). | |
| The kernel-internal versions of the stat structure dealt with by | |
| the different versions are, respectively: | |
| __old_kernel_stat | |
| The original structure, with rather narrow fields, | |
| and no padding. | |
| stat Larger st_ino field and padding added to various | |
| parts of the structure to allow for future expansion. | |
| stat64 Even larger st_ino field, larger st_uid and st_gid | |
| fields to accommodate the Linux-2.4 expansion of UIDs | |
| and GIDs to 32 bits, and various other enlarged | |
| fields and further padding in the structure. (Vari? | |
| ous padding bytes were eventually consumed in Linux | |
| 2.6, with the advent of 32-bit device IDs and | |
| nanosecond components for the timestamp fields.) | |
| The glibc stat() wrapper function hides these details from applica? | |
| tions, invoking the most recent version of the system call provided | |
| by the kernel, and repacking the returned information if required | |
| for old binaries. | |
| ------------------------------------------------------------------- | |
| So it's likely that glibc is tampering with stat instead of just | |
| forwarding the syscall. | |
| You can actually check this by writing a small libc stat test and | |
| using strace to trace syscalls: | |
| ------------------------------------------------------------------- | |
| $ cat > stattest.c << "EOF" | |
| #include <sys/stat.h> | |
| int main() | |
| { | |
| struct stat s; | |
| stat("/etc/hosts", &s); | |
| return 0; | |
| } | |
| EOF | |
| $ gcc -m32 stattest.c | |
| $ strace ./a.out | |
| execve("./a.out", ["./a.out"], [/* 83 vars */]) = 0 | |
| [ Process PID=22487 runs in 32 bit mode. ] | |
| ... stuff we don't care about ... | |
| stat64("/etc/hosts", {st_mode=S_IFREG|0644, st_size=1212, ...}) = 0 | |
| exit_group(0) = ? | |
| +++ exited with 0 +++ | |
| ------------------------------------------------------------------- | |
| Yep, as expected, the stat call is getting translated to stat64! | |
| So how do we fix this? By not trusting libc headers and digging | |
| into the kernel headers (which I found by googling the kernel | |
| struct names): | |
| ------------------------------------------------------------------- | |
| $ printf "#include <asm/stat.h>" \ | |
| | gcc -m32 -E - | grep -A 30 "struct stat" | |
| struct stat { | |
| unsigned long st_dev; | |
| unsigned long st_ino; | |
| unsigned short st_mode; | |
| unsigned short st_nlink; | |
| unsigned short st_uid; | |
| unsigned short st_gid; | |
| unsigned long st_rdev; | |
| unsigned long st_size; | |
| unsigned long st_blksize; | |
| unsigned long st_blocks; | |
| unsigned long st_atime; | |
| unsigned long st_atime_nsec; | |
| unsigned long st_mtime; | |
| unsigned long st_mtime_nsec; | |
| unsigned long st_ctime; | |
| unsigned long st_ctime_nsec; | |
| unsigned long __unused4; | |
| unsigned long __unused5; | |
| }; | |
| ------------------------------------------------------------------- | |
| That's a very different than what glibc headers were telling us! | |
| There is no padding and st_dev is 4 bytes instead of 8, as well | |
| as a lot of other fields having smaller sizes. | |
| What about the 64-bit version of it? | |
| ------------------------------------------------------------------- | |
| $ printf "#include <asm/stat.h>" \ | |
| | gcc -E - | grep -A 30 "struct stat" | |
| struct stat { | |
| __kernel_ulong_t st_dev; | |
| __kernel_ulong_t st_ino; | |
| __kernel_ulong_t st_nlink; | |
| unsigned int st_mode; | |
| unsigned int st_uid; | |
| unsigned int st_gid; | |
| unsigned int __pad0; | |
| __kernel_ulong_t st_rdev; | |
| __kernel_long_t st_size; | |
| __kernel_long_t st_blksize; | |
| __kernel_long_t st_blocks; | |
| __kernel_ulong_t st_atime; | |
| __kernel_ulong_t st_atime_nsec; | |
| __kernel_ulong_t st_mtime; | |
| __kernel_ulong_t st_mtime_nsec; | |
| __kernel_ulong_t st_ctime; | |
| __kernel_ulong_t st_ctime_nsec; | |
| __kernel_long_t __unused[3]; | |
| }; | |
| ------------------------------------------------------------------- | |
| This one seems to have the correct layout, except that some of the | |
| values are unsigned rather than signed. | |
| Here's our fixed stat struct: | |
| ------------------------------------------------------------------- | |
| typedef uintptr dev_t; | |
| typedef intptr syscall_slong_t; | |
| typedef uintptr syscall_ulong_t; | |
| typedef uintptr time_t; | |
| typedef struct | |
| { | |
| time_t sec; | |
| syscall_ulong_t nsec; | |
| } | |
| timespec; | |
| typedef struct | |
| { | |
| dev_t dev; | |
| uintptr ino; | |
| #ifdef AMD64 | |
| uintptr nlink; | |
| u32 mode; | |
| u32 uid; | |
| u32 gid; | |
| u32 __pad0; | |
| #else | |
| u16 mode; | |
| u16 nlink; | |
| u16 uid; | |
| u16 gid; | |
| #endif | |
| dev_t rdev; | |
| uintptr size; | |
| uintptr blksize; | |
| uintptr blocks; | |
| timespec atim; | |
| timespec mtim; | |
| timespec ctim; | |
| #ifdef AMD64 | |
| syscall_slong_t __unused[3]; | |
| #else | |
| u32 __unused4; | |
| u32 __unused5; | |
| #endif | |
| } | |
| stat_info; | |
| ------------------------------------------------------------------- | |
| Now we can run it again and verify that the struct is properly | |
| populated in both architectures (I added comments to show where | |
| fields are, those aren't actually part of hexdump) | |
| ------------------------------------------------------------------- | |
| $ ./stat | hexdump -C | |
| 00000000 12 08 00 00 00 00 00 00 50 59 0a 00 00 00 00 00 | |
| | dev | ino | | |
| 00000010 01 00 00 00 00 00 00 00 a4 81 00 00 00 00 00 00 | |
| | nlink | mode | uid | | |
| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
| | gid | __pad0 | rdev | | |
| 00000030 bc 04 00 00 00 00 00 00 00 10 00 00 00 00 00 00 | |
| | size | blksize | | |
| 00000040 08 00 00 00 00 00 00 00 24 b2 e9 57 00 00 00 00 | |
| | blocks | atim.sec | | |
| 00000050 d1 f4 e1 2f 00 00 00 00 e8 d8 5e 57 00 00 00 00 | |
| | atim.nsec | mtim.sec | | |
| 00000060 a0 3a b4 24 00 00 00 00 e8 d8 5e 57 00 00 00 00 | |
| | mtim.nsec | ctim.sec | | |
| 00000070 20 c8 0f 25 00 00 00 00 00 00 00 00 00 00 00 00 | |
| | ctim.nsec | __unused[0] | | |
| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
| | __unused[1] | __unused[2] | | |
| 00000090 | |
| $ ./build.sh i386 | |
| $ ./stat | hexdump -C | |
| 00000000 12 08 00 00 50 59 0a 00 a4 81 01 00 00 00 00 00 | |
| | dev | ino | mode |nlink| uid | gid | | |
| 00000010 00 00 00 00 bc 04 00 00 00 10 00 00 08 00 00 00 | |
| | rdev | size | blksize | blocks | | |
| 00000020 24 b2 e9 57 d1 f4 e1 2f e8 d8 5e 57 a0 3a b4 24 | |
| | atim.sec | atim.nsec | mtim.sec | mtim.nsec | | |
| 00000030 e8 d8 5e 57 20 c8 0f 25 00 00 00 00 00 00 00 00 | |
| | ctim.sec | ctim.nsec | __unused4 | __unused5 | | |
| 00000040 | |
| ------------------------------------------------------------------- | |
| In short, try getting structs from kernel headers instead of libc. | |
| ################################################################### | |
| Legacy sockets on i386 | |
| ################################################################### | |
| Another thing you should be aware of, is that some syscalls might | |
| work entirely differently on i386 because of historical reasons. | |
| Socket syscalls are a perfect example. i386 doesn't have SYS_accept | |
| and as far as I know the other socket syscalls are also not | |
| guaranteed to exist. | |
| Instead, i386 multiplexes all socket syscalls through a single | |
| syscall named "socketcall", which takes an additional param which | |
| specifies which socket operation we want do to, (accept, connect, | |
| etc...) followed by the usual syscall params that we find on amd64. | |
| Also, parameters for socketcall are passed through a void* array, | |
| so the socketcall syscall just takes two parameters: the call | |
| number and the pointer to the parameters array. | |
| Googling the socketcall numbers was a bit difficult, but I | |
| eventually found them in linux/net.h. | |
| ------------------------------------------------------------------- | |
| $ printf "#include <sys/syscall.h>\nblah SYS_accept" \ | |
| | gcc -m32 -E - | grep blah | |
| blah SYS_accept | |
| $ man socketcall | |
| SYNOPSIS | |
| int socketcall(int call, unsigned long *args); | |
| DESCRIPTION | |
| socketcall() is a common kernel entry point for the socket system | |
| calls. call determines which socket function to invoke. args | |
| points to a block containing the actual arguments, which are passed | |
| through to the appropriate call. | |
| User programs should call the appropriate functions by their usual | |
| names. Only standard library implementors and kernel hackers need | |
| to know about socketcall(). | |
| $ printf "#include <linux/net.h>\nblah SYS_SOCKET" \ | |
| | gcc -m32 -E - | grep blah | |
| blah 1 | |
| $ printf "#include <linux/net.h>\nblah SYS_CONNECT" \ | |
| | gcc -m32 -E - | grep blah | |
| blah 3 | |
| ------------------------------------------------------------------- | |
| Here's an example socket application for i386 and amd64 that | |
| connects to sdf.org's gopherspace (192.94.73.15:70) and dumps the | |
| output for the root folder. | |
| I got the sockaddr_in struct from netinet/in.h and the socket | |
| constants from sys/socket.h | |
| ------------------------------------------------------------------- | |
| $ cat amd64/syscalls.h | |
| #define SYS_read 0 | |
| #define SYS_write 1 | |
| #define SYS_close 3 | |
| #define SYS_socket 41 | |
| #define SYS_connect 42 | |
| #define SYS_exit 60 | |
| $ cat i386/syscalls.h | |
| #define SYS_read 3 | |
| #define SYS_write 4 | |
| #define SYS_close 6 | |
| #define SYS_exit 1 | |
| #define SYS_socketcall 102 | |
| $ cat socket.c | |
| #define internal static | |
| void* syscall1( | |
| uintptr number, | |
| void* arg1 | |
| ); | |
| void* syscall2( | |
| uintptr number, | |
| void* arg1, | |
| void* arg2 | |
| ); | |
| void* syscall3( | |
| uintptr number, | |
| void* arg1, | |
| void* arg2, | |
| void* arg3 | |
| ); | |
| /* ------------------------------------------------------------- */ | |
| #define stdout 1 | |
| #define stderr 2 | |
| internal | |
| void close(int fd) { | |
| syscall1(SYS_close, (void*)(intptr)fd); | |
| } | |
| internal | |
| intptr write(int fd, void const* data, uintptr nbytes) | |
| { | |
| return (uintptr) | |
| syscall3( | |
| SYS_write, | |
| (void*)(intptr)fd, | |
| (void*)data, | |
| (void*)nbytes | |
| ); | |
| } | |
| internal | |
| intptr read(int fd, void* data, intptr nbytes) | |
| { | |
| return (intptr) | |
| syscall3( | |
| SYS_read, | |
| (void*)(intptr)fd, | |
| data, | |
| (void*)nbytes | |
| ); | |
| } | |
| #define AF_INET 2 | |
| #define SOCK_STREAM 1 | |
| #define IPPROTO_TCP 6 | |
| typedef struct | |
| { | |
| u16 family; | |
| u16 port; /* NOTE: this is big endian!!!!!!! use flip16u */ | |
| u32 addr; /* this is also big endian */ | |
| u8 zero[8]; | |
| } | |
| sockaddr_in; | |
| #ifdef SYS_socketcall | |
| /* i386 multiplexes socket calls through socketcall */ | |
| #define SYS_SOCKET 1 | |
| #define SYS_CONNECT 3 | |
| internal | |
| int socketcall(u32 call, void* args) | |
| { | |
| return (int)(intptr) | |
| syscall2( | |
| SYS_socketcall, | |
| (void*)(intptr)call, | |
| args | |
| ); | |
| } | |
| #endif | |
| internal | |
| int socket(u16 family, i32 type, i32 protocol) | |
| { | |
| #ifndef SYS_socketcall | |
| return (int)(intptr) | |
| syscall3( | |
| SYS_socket, | |
| (void*)(intptr)family, | |
| (void*)(intptr)type, | |
| (void*)(intptr)protocol | |
| ); | |
| #else | |
| void* args[3]; | |
| args[0] = (void*)(intptr)family; | |
| args[1] = (void*)(intptr)type; | |
| args[2] = (void*)(intptr)protocol; | |
| return socketcall(SYS_SOCKET, args); | |
| #endif | |
| } | |
| internal | |
| int connect(int sockfd, sockaddr_in const* addr) | |
| { | |
| #ifndef SYS_socketcall | |
| return (int)(intptr) | |
| syscall3( | |
| SYS_connect, | |
| (void*)(intptr)sockfd, | |
| (void*)addr, | |
| (void*)sizeof(sockaddr_in) | |
| ); | |
| #else | |
| void* args[3]; | |
| args[0] = (void*)(intptr)sockfd; | |
| args[1] = (void*)addr; | |
| args[2] = (void*)sizeof(sockaddr_in); | |
| return socketcall(SYS_CONNECT, args); | |
| #endif | |
| } | |
| /* ------------------------------------------------------------- */ | |
| internal | |
| intptr strlen(char const* str) | |
| { | |
| char const* p; | |
| for(p = str; *p; ++p); | |
| return p - str; | |
| } | |
| internal | |
| intptr fputs(int fd, char const* str) { | |
| return write(fd, str, strlen(str)); | |
| } | |
| /* reverses byte order of a 16-bit integer (0x1234 -> 0x3412) */ | |
| internal | |
| u16 flip16u(u16 v) { | |
| return (v << 8) | (v >> 8); | |
| } | |
| /* ------------------------------------------------------------- */ | |
| #define BUFSIZE 512 | |
| internal | |
| int socket_run(int argc, char const* argv[]) | |
| { | |
| int res = 0; /* return code */ | |
| int fd; | |
| u8 ip_raw[] = { 192, 94, 73, 15 }; /* ip in big endian order */ | |
| u32* pip = (u32*)ip_raw; /* pointer to ip as a 32-bit int */ | |
| sockaddr_in a; | |
| intptr n; | |
| u8 buf[BUFSIZE]; | |
| /* set up sockaddr struct with desired ip & port */ | |
| a.family = AF_INET; | |
| a.port = flip16u(70); | |
| a.addr = *pip; | |
| for (n = 0; n < 8; ++n) { | |
| a.zero[n] = 0; | |
| } | |
| /* create a new socket */ | |
| fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); | |
| if (fd < 0) { | |
| fputs(stderr, "socket failed\n"); | |
| return 1; | |
| } | |
| /* connect to sdf.org */ | |
| if (connect(fd, &a) < 0) | |
| { | |
| fputs(stderr, "connect failed\n"); | |
| res = 1; | |
| goto cleanup; | |
| } | |
| /* request folder / */ | |
| fputs(fd, "/\r\n"); | |
| /* read chunks of BUFSIZE bytes and relay them to stdout until | |
| there is nothing left to read or the socket errors out */ | |
| while (1) | |
| { | |
| n = read(fd, buf, BUFSIZE); | |
| if (n <= 0) { | |
| break; | |
| } | |
| if (write(stdout, buf, n) != n) | |
| { | |
| fputs(stderr, "write failed\n"); | |
| res = 1; | |
| break; | |
| } | |
| } | |
| if (n < 0) { | |
| fputs(stderr, "read failed\n"); | |
| res = 1; | |
| } | |
| cleanup: | |
| /* make sure to not leave a dangling socket file descriptor */ | |
| close(fd); | |
| return res; | |
| } | |
| ------------------------------------------------------------------- | |
| And as you can see, we are running flawlessly on both architectures | |
| ------------------------------------------------------------------- | |
| $ ./build.sh && ./socket | |
| iWelcome to the SDF Public Access UNIX System .. est. 1987... | |
| $ ./build.sh i386 && ./socket | |
| iWelcome to the SDF Public Access UNIX System .. est. 1987... | |
| ------------------------------------------------------------------- | |
| ################################################################### | |
| Conclusion | |
| ################################################################### | |
| I hope this guide got you interested in understanding what happens | |
| at the lowest level and knowing your programming language and OS | |
| beyond the standard library! Have fun! I will add more tricks if | |
| I come up with new ones. |
but is it possible to not write assembly code , and still use -nostdlib to invoke those syscalls?
@yunfan You will still need some assembly code, albeit you can write it directly in the C file:
/* Compile with: gcc -nostdlib -static aloha.c -o aloha */
void _start(void) {
const char msg[] = "Aloha Welt\n";
__asm__ volatile(
"mov $1, %%rax\n" // sys_write
"mov $1, %%rdi\n" // fd = stdout
"mov %0, %%rsi\n" // buf
"mov %1, %%rdx\n" // count
"syscall\n"
"mov $60, %%rax\n" // sys_exit
"xor %%rdi, %%rdi\n" // status = 0
"syscall\n"
:
: "r"(msg), "r"(sizeof(msg)-1)
: "%rax", "%rdi", "%rsi", "%rdx"
);
}@tcoppex Thank you for your work!
I decided to see how small of an executable I can get from your inline assembly code. I am using a custom linker script to further reduce the size from the object produced by GCC. Here we go!
EDIT Got it down to 464 bytes!
File aloha.c:
void _start(void) {
const char msg[] = "Aloha Welt\n";
__asm__ volatile(
"mov $1, %%rax\n" /* // sys_write */
"mov $1, %%rdi\n" /* // fd = stdout */
"mov %0, %%rsi\n" /* // buf */
"mov %1, %%rdx\n" /* // count */
"syscall\n"
"mov $60, %%rax\n" /* // sys_exit */
"xor %%rdi, %%rdi\n" /* // status = 0 */
"syscall\n"
:
: "r"(msg), "r"(sizeof(msg)-1)
: "%rax", "%rdi", "%rsi", "%rdx"
);
}File build.sh:
#!/bin/bash
exename="aloha"
echo "Remove old stuff!"
rm -rf "${exename}"
rm -rf "${exename}.o"
echo "Compile!"
gcc \
-std=c89 \
-pedantic \
-s \
-Wall \
-Werror \
-static \
-nostdlib \
-fno-unwind-tables \
-fno-asynchronous-unwind-tables \
-fdata-sections \
-Wl,--gc-sections \
-Wa,--noexecstack \
-fno-builtin \
"${exename}.c" \
-c \
-o "${exename}.o"
echo "Link!"
ld \
--strip-all \
-z max-page-size=4096 \
-z noseparate-code \
-static \
-nostdlib \
-o "${exename}" "${exename}.o" \
-T link.ld
echo "Strip!"
strip -R .comment $exename
strip -R .gnu.hash $exename
strip -R .shstrtab $exename
echo "Done!"
exit 0File link.ld:
OUTPUT_FORMAT("elf64-x86-64");
OUTPUT_ARCH("i386:x86-64");
ENTRY("_start");
SECTIONS
{
. = SEGMENT_START("text-segment", 0x10000) + SIZEOF_HEADERS;
}To test things:
- To build the executable, just run
build.sh. - To see the stdout output, you can execute the binary, just run
aloha. - The resulting size of the executable on my system is:
$ wc -c ./aloha
464 ./aloha
PS: Versions of stuff I am using:
$ ld --version
GNU ld (GNU Binutils for Debian) 2.44
Copyright (C) 2025 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
$ gcc --version
gcc (Debian 14.2.0-19) 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ cat /etc/*-release
PRETTY_NAME="Debian GNU/Linux 13 (trixie)"
NAME="Debian GNU/Linux"
VERSION_ID="13"
VERSION="13 (trixie)"
VERSION_CODENAME=trixie
DEBIAN_VERSION_FULL=13.1
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ uname -a
Linux lp-rusty 6.12.43+deb13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.43-1 (2025-08-27) x86_64 GNU/Linux
@yunfan You will still need some assembly code, albeit you can write it directly in the C file:
thanks, i'd rather to separated these into another files that contains the implementation of syscall1 to syscall5
and then invoke then from my soucecode
Very nice. I learned a lot nice job. I saw you was active 10 mins ago. Is there a new one?