This document show how you can extend the xv6 kernel by a new system call, getppid(). We do this by modifying the syscall.h, syscall.c and sysproc.c files in the kernel code and modifying user.h and usys.S in the user space code.
The xv6 kernel, henceforth referred to as the kernel, is a re-implementation of the unix version 6 kernel. Originally written by Dennis Ritchie and Ken Thompson in 1975 using the C programming language. The kernel targets two architectures: x86 and risc-v . this document focuses on the x86 version. Xv6 is an operating system developed in 2006 by MIT for educational purposes. It is used in MIT’s undergraduate course: CS 6.1810: Operating Systems Engineering . The complete implementation of xv6 is around 6000 lines of code which makes it ideal to study and understand the main concepts of an (unix) operating system.
If you want to play around with the kernel you will need the following:
- xv6 Source Code
- QEMU |
qemu-system-x86_64(for emulating the hardware the kernel will run on) - GCC |
i386-elf-gcc(needed to produce 32-bit code)
The source code of xv6 can be found here. On Mac OS with an M2 chip you can install both QEMU and the GCC cross compiler with brew. Alternatively, you can always try and build from source.
The kernel supports 21 system calls. If you use linux (or any unix-like operating system) these will all be familiar to you. It is clear from these 21 system calls that xv6 is fairly small and is missing a lot of system calls of a complete operating system. One of these is the getppid() system call.
On my system the man page for getppid specifies the following:
SYNOPSIS
#include <unistd.h>
pid_t
getpid(void);
pid_t
getppid(void);
DESCRIPTION
getpid() returns the process ID of the calling process.
The ID is guaranteed to be unique and is useful
for constructing temporary file names.
getppid() returns the process ID of the parent of the calling process.
ERRORS
The getpid() and getppid() functions are always successful, and no
return value is reserved to indicate an error.
Basically, by using the getppid system call we receive the current process’ parent pid. So how can we implement this? Well we need to add another system call and we need to get the current process’s parent pid.
We know that the kernel is capable of starting processes from user-space. The kernel has to somehow keep track of these processes. For now we do not need to understand the whole picture of how process’ are created or how they are scheduled. By looking at the code we quickly find the file proc.h this file contains the structure that the kernel uses to represent a process.
// Per-process state
struct proc {
uint sz; // Size of process memory (bytes)
pde_t* pgdir; // Page table
char *kstack; // Bottom of kernel stack for this process
enum procstate state; // Process state
int pid; // Process ID
struct proc *parent; // Parent process
struct trapframe *tf; // Trap frame for current syscall
struct context *context; // swtch() here to run process
void *chan; // If non-zero, sleeping on chan
int killed; // If non-zero, have been killed
struct file *ofile[NOFILE]; // Open files
struct inode *cwd; // Current directory
char name[16]; // Process name (debugging)
};As we can see the struct contains the pid of a process and interestingly it also contains a pointer to the parent process’s proc struct. This means that if we have the current process we can access the parent and then return the parent’s pid.
In x86, a process (in user space) invokes a system call by setting up parameters in registers and triggering a software interrupt using the int 0x80 instruction. The kernel, in response, executes the requested operation and returns the result to the process. We need to implement both sides, the user space and the kernel space.
.globl getpid;
getpid:
movl getpid, %eax;
int $11;
retHere we can see the necessary assembly code for the getpid(void) system call.
In user.h we see the definitions of all the system call functions. We add a new definition
int getppid(void);Then we need to add the actual code of this function. In usys.S we have an assembly file that generates the assembly code for all system calls. This is done by using macros to generate the same boiler plate assembly. We just need to add a new entry into usys.S.
SYSCALL(getppid)This is basically all we have to do for a program in user space to call a new system call.
defines which number corresponds to which system call. We create a new entry
#define SYS_getppid 22This way we can execute the getppid system call by using the following instruction from user space.
int $22contains a mechanism of dispatching system calls. The array of function pointer syscalls maps system calls numbers to system call functions. Here we need to add two entries
extern int sys_getppid(void);And in the syscalls array we add a new entry
[SYS_getppid] sys_getppid,Now all that we have to add is the actual sys_getppid(void) function.
Contains system calls related to processes. In here we will add a new function for our system call. The myproc(void) function returns a copy of the current process struct that is executing. Furthermore, as we already saw this struct contains a pointer to the parent process struct. We can this return the parent’s pid.
int
sys_getppid(void) {
return myproc()->parent->pid;
}We can now write a simple user application that will print the current process parent’s pid.
#include "types.h"
#include "stat.h"
#include "user.h"
#include "fs.h"
int
main(int argc, char *argv[])
{
int pid = getpid();
int ppid = getppid();
printf(1, "PID: %d\n", pid);
printf(1, "parent PID: %d\n", ppid);
exit();
}If you want to know how to add new user programs to the image look at the following blogpost. Running our user program gives us the following output.
init: starting sh
$ sh
$ hello
PID: 5
parent PID: 3
$ hello
PID: 6
parent PID: 3