In this lab, you will implement a Unix-like fork(), which allows a user-mode environment to create copies of itself efficiently.
Create a local branch called lab6 based on our lab6 branch, origin/lab6, and then fetch the latest version from the course repository:
$ cd ~/cs134/lab $ git checkout --track origin/lab6 Branch lab6 set up to track remote branch refs/remotes/origin/lab6. Switched to a new branch "lab6" $ git pull upstream lab6 # Pulls any changes I have made in the upstream repository $
You will now need to merge the changes you made in your lab5 branch into the lab6 branch, as follows:
$ git merge lab5 Merge made by the recursive strategy. ... $
In some cases, Git may not be able to figure out how to merge your changes with the new lab assignment (e.g. if you modified some of the code that is changed in the second lab assignment). In that case, the git merge command will tell you which files are conflicted, and you should first resolve the conflict (by editing the relevant files) and then commit the resulting files with git commit -a.
Important note. If your Pull Request for lab5 has finished being reviewed, then you know that lab5 is complete, and you will never need to merge from lab5 again. However, if it is is still being reviewed, then there may be changes required before the review is complete. Those changes will need to be merged into lab6-no-code, which you can do after the Pull Request is complete, by another call to git merge lab5 from lab6-no-code. Then, you would do a git merge lab6-no-code from lab6.You should merge into both labs so that the Pull Request for lab6 does not include the changes from lab5.
At this point, Lab 6 is ready to go. Before making any code changes, do the following:
$ git branch lab6-no-code # creates a branch prior to adding any Lab 6 code $ git push -u origin lab6-no-code # pushes the new branch to the origin
In this lab and subsequent labs, do all of the regular exercises described in the lab. You can also do challenge problems. (Some challenge problems are more challenging than others, of course!) Additionally, write up brief answers to any questions posed in the lab and a short (e.g., one or two paragraph) description of what you did to solve each chosen challenge problem. Place the write-up in a file called answers-lab6.txt in the top level of your lab directory before submitting your work. Do not forget to add that file to git.
As mentioned in Lab 5,
Unix provides the fork()
system call
as its primary process creation primitive.
The fork()
system call
copies the address space of the calling process (the parent)
to create a new process (the child).
xv6 Unix implements fork()
by copying all data from the
parent's pages into new pages allocated for the child.
This is essentially the same approach
that dumbfork()
takes.
The copying of the parent's address space into the child is
the most expensive part of the fork()
operation.
However, a call to fork()
is frequently followed almost immediately
by a call to exec()
in the child process,
which replaces the child's memory with a new program.
This is what the the shell typically does, for example.
In this case,
the time spent copying the parent's address space is largely wasted,
because the child process will use
very little of its memory before calling exec()
.
For this reason,
later versions of Unix took advantage
of virtual memory hardware
to allow the parent and child to share
the memory mapped into their respective address spaces
until one of the processes actually modifies it.
This technique is known as copy-on-write.
To do this,
on fork()
the kernel would
copy the address space mappings
from the parent to the child
instead of the contents of the mapped pages,
and at the same time mark the now-shared pages read-only.
When one of the two processes tries to write to one of these shared pages,
the process takes a page fault.
At this point, the Unix kernel realizes that the page
was really a "virtual" or "copy-on-write" copy,
and so it makes a new, private, writable copy of the page for the
faulting process.
In this way, the contents of individual pages aren't actually copied
until they are actually written to.
This optimization makes a fork()
followed by
an exec()
in the child much cheaper:
the child will probably only need to copy one page
(the current page of its stack)
before it calls exec()
.
In the next piece of this lab, you will implement a "proper"
Unix-like fork()
with copy-on-write,
as a user space library routine.
Implementing fork()
and copy-on-write support in user space
has the benefit that the kernel remains much simpler
and thus more likely to be correct.
It also lets individual user-mode programs
define their own semantics for fork()
.
A program that wants a slightly different implementation
(for example, the expensive always-copy version like dumbfork()
,
or one in which the parent and child actually share memory afterward)
can easily provide its own.
A user-level copy-on-write fork()
needs to know about
page faults on write-protected pages, so that's what you'll
implement first.
Copy-on-write is only one of many possible uses
for user-level page fault handling.
It's common to set up an address space so that page faults indicate when some action needs to take place. For example, most Unix kernels initially map only a single page in a new process's stack region, and allocate and map additional stack pages later "on demand" as the process's stack consumption increases and causes page faults on stack addresses that are not yet mapped. A typical Unix kernel must keep track of what action to take when a page fault occurs in each region of a process's space. For example, a fault in the stack region will typically allocate and map new page of physical memory. A fault in the program's BSS region will typically allocate a new page, fill it with zeroes, and map it. In systems with demand-paged executables, a fault in the text region will read the corresponding page of the binary off of disk and then map it.
This is a lot of information for the kernel to keep track of. Instead of taking the traditional Unix approach, you will decide what to do about each page fault in user space, where bugs are less damaging. This design has the added benefit of allowing programs great flexibility in defining their memory regions; you'll use user-level page fault handling later for mapping and accessing files on a disk-based file system.
In order to handle its own page faults,
a user environment will need to register
a page fault handler entrypoint with the JOS kernel.
The user environment registers its page fault entrypoint
via the new sys_env_set_pgfault_upcall
system call.
We have added a new member to the Env
structure,
env_pgfault_upcall
,
to record this information.
Exercise 1.
Implement the sys_env_set_pgfault_upcall
system call.
Be sure to enable permission checking
when looking up the environment ID of the target environment,
since this is a "dangerous" system call.
During normal execution,
a user environment in JOS
will run on the normal user stack:
its ESP register starts out pointing at USTACKTOP
,
and the stack data it pushes resides on the page
between USTACKTOP-PGSIZE
and USTACKTOP-1
inclusive.
When a page fault occurs in user mode,
however,
the kernel will restart the user environment
running a designated user-level page fault handler
on a different stack,
namely the user exception stack.
In essence, we will make the JOS kernel
implement automatic "stack switching"
on behalf of the user environment,
in much the same way that the x86 processor
already implements stack switching on behalf of JOS
when transferring from user mode to kernel mode!
The JOS user exception stack is also one page in size,
and its top is defined to be at virtual address UXSTACKTOP
,
so the valid bytes of the user exception stack
are from UXSTACKTOP-PGSIZE
through UXSTACKTOP-1
inclusive.
While running on this exception stack,
the user-level page fault handler
can use JOS's regular system calls to map new pages or adjust mappings
so as to fix whatever problem originally caused the page fault.
Then the user-level page fault handler returns,
via an assembly language stub,
to the faulting code on the original stack.
Each user environment that wants to support user-level page fault handling
will need to allocate memory for its own exception stack,
using the sys_page_alloc()
system call introduced in part A.
You will now need to change the page fault handling code in kern/trap.c to handle page faults from user mode as follows. We will call the state of the user environment at the time of the fault the trap-time state.
If there is no page fault handler registered,
the JOS kernel destroys the user environment with a message as before.
Otherwise,
the kernel sets up a trap frame on the exception stack that looks like
a struct UTrapframe
from inc/trap.h:
<-- UXSTACKTOP trap-time esp trap-time eflags trap-time eip trap-time eax start of struct PushRegs trap-time ecx trap-time edx trap-time ebx trap-time esp trap-time ebp trap-time esi trap-time edi end of struct PushRegs tf_err (error code) fault_va <-- %esp when handler is run
The kernel then arranges for the user environment to resume execution with the page fault handler running on the exception stack with this stack frame; you must figure out how to make this happen. The fault_va is the virtual address that caused the page fault.
If the user environment is already running on the user exception stack
when an exception occurs,
then the page fault handler itself has faulted.
In this case,
you should start the new stack frame just under the current
tf->tf_esp
rather than at UXSTACKTOP
.
You should first push an empty 32-bit word, then a struct UTrapframe
.
To test whether tf->tf_esp
is already on the user
exception stack, check whether it is in the range
between UXSTACKTOP-PGSIZE
and UXSTACKTOP-1
, inclusive.
Exercise 2.
Implement the code in page_fault_handler
in
kern/trap.c
required to dispatch page faults to the user-mode handler.
Be sure to take appropriate precautions
when writing into the exception stack.
(What happens if the user environment runs out of space
on the exception stack?)
Next, you need to implement the assembly routine that will
take care of calling the C page fault handler and resume
execution at the original faulting instruction.
This assembly routine is the handler that will be registered
with the kernel using sys_env_set_pgfault_upcall()
.
Exercise 3.
Implement the _pgfault_upcall
routine
in lib/pfentry.S.
The interesting part is returning to the original point in
the user code that caused the page fault.
You'll return directly there, without going back through
the kernel.
The hard part is simultaneously switching stacks and
re-loading the EIP.
Finally, you need to implement the C user library side of the user-level page fault handling mechanism.
Exercise 4.
Finish set_pgfault_handler()
in lib/pgfault.c.
Run user/faultread (make run-faultread). You should see:
... [00000000] new env 00001000 [00001000] user fault va 00000000 ip 0080003a TRAP frame ... [00001000] free env 00001000
Run user/faultdie. You should see:
... [00000000] new env 00001000 i faulted at va deadbeef, err 6 [00001000] exiting gracefully [00001000] free env 00001000
Run user/faultalloc. You should see:
... [00000000] new env 00001000 fault deadbeef this string was faulted in at deadbeef fault cafebffe fault cafec000 this string was faulted in at cafebffe [00001000] exiting gracefully [00001000] free env 00001000
If you see only the first "this string" line, it means you are not handling recursive page faults properly.
Run user/faultallocbad. You should see:
... [00000000] new env 00001000 [00001000] user_mem_check assertion failure for va deadbeef [00001000] free env 00001000
Make sure you understand why user/faultalloc and user/faultallocbad behave differently.
Challenge! Extend your kernel so that not only page faults, but all types of processor exceptions that code running in user space can generate, can be redirected to a user-mode exception handler. Write user-mode test programs to test user-mode handling of various exceptions such as divide-by-zero, general protection fault, and illegal opcode.
You now have the kernel facilities
to implement copy-on-write fork()
entirely in user space.
We have provided a skeleton for your fork()
in lib/fork.c.
Like dumbfork()
,
fork()
should create a new environment,
then scan through the parent environment's entire address space
and set up corresponding page mappings in the child.
The key difference is that,
while dumbfork()
copied pages,
fork()
will initially only copy page mappings.
fork()
will
copy each page only when one of the environments tries to write it.
The basic control flow for fork()
is as follows:
pgfault()
as the C-level page fault handler,
using the set_pgfault_handler()
function
you implemented above.sys_exofork()
to create
a child environment.duppage
, which should
map the page copy-on-write into the address
space of the child and then remap the page copy-on-write
in its own address space. [ Note: The ordering here (i.e., marking a page
as COW in the child before marking it in the parent) actually matters!
Can you see why? Try to think of a specific case where reversing the
order could cause trouble. ] duppage
sets both PTEs so that
the page is not writeable, and to contain PTE_COW
in the
"avail" field to distinguish copy-on-write pages from genuine
read-only pages.
The exception stack is not remapped this way, however. Instead you need to allocate a fresh page in the child for the exception stack. Since the page fault handler will be doing the actual copying and the page fault handler runs on the exception stack, the exception stack cannot be made copy-on-write: who would copy it?
fork()
also needs to handle pages that are
present, but not writable or copy-on-write.
Each time one of the environments writes a copy-on-write page that it hasn't yet written, it will take a page fault. Here's the control flow for the user page fault handler:
_pgfault_upcall
,
which calls fork()
's pgfault()
handler.pgfault()
checks that the fault is a write
(check for FEC_WR
in the error code) and that the
PTE for the page is marked PTE_COW
.
If not, panic.pgfault()
allocates a new page mapped
at a temporary location and copies
the contents of the faulting page into it.
Then the fault handler maps the new page at the
appropriate address with read/write permissions,
in place of the old read-only mapping.The user-level lib/fork.c code must consult the environment's page
tables for several of the operations above (e.g., that the PTE for a page is
marked PTE_COW
). The kernel maps the environment's page tables at
UVPT
exactly for this purpose. It uses a clever mapping trick to make it to make it easy to lookup
PTEs for user code. lib/entry.S sets up uvpt
and
uvpd
so that you can easily lookup page-table information in
lib/fork.c.
Exercise 5.
Implement fork
, duppage
and
pgfault
in lib/fork.c.
Test your code with the forktree program. It should produce the following messages, with interspersed 'new env', 'free env', and 'exiting gracefully' messages. The messages may not appear in this order, and the environment IDs may be different.
1000: I am '' 1001: I am '0' 2000: I am '00' 2001: I am '000' 1002: I am '1' 3000: I am '11' 3001: I am '10' 4000: I am '100' 1003: I am '01' 5000: I am '010' 4001: I am '011' 2002: I am '110' 1004: I am '001' 1005: I am '111' 1006: I am '101'
Challenge!
Implement a shared-memory fork()
called sfork()
. This version should have the parent
and child share all their memory pages
(so writes in one environment appear in the other)
except for pages in the stack area,
which should be treated in the usual copy-on-write manner.
Modify user/forktree.c
to use sfork()
instead of regular fork()
.
Also, once you have finished implementing IPC in part C,
use your sfork()
to run user/pingpongs.
You will have to find a new way to provide the functionality
of the global thisenv
pointer.
Challenge!
Your implementation of fork
makes a huge number of system calls. On the x86, switching into
the kernel using interrupts has non-trivial cost. Augment the
system call interface
so that it is possible to send a batch of system calls at once.
Then change fork
to use this interface.
How much faster is your new fork
?
You can answer this (roughly) by using analytical
arguments to estimate how much of an improvement batching
system calls will make to the performance of your
fork
: How expensive is an int 0x30
instruction? How many times do you execute int 0x30
in your fork
? Is accessing the TSS stack
switch also expensive? And so on...
Alternatively, you can boot your kernel on real hardware
and really benchmark your code. See the RDTSC
(read time-stamp counter) instruction, defined in the IA32
manual, which counts the number of clock cycles that have
elapsed since the last processor reset. QEMU doesn't emulate
this instruction faithfully (it can either count the number of
virtual instructions executed or use the host TSC, neither of
which reflects the number of cycles a real CPU would
require).
This completes the lab. In the lab directory, commit your changes with git commit and type make handin to get instructions for submitting your code.
See our page on GitHub and Pull Requests for detailed information on pull requests and submitting your code.