CS137 Homework 2: FAT, part 1
Overview
The purpose of this assignment is to begin developing a real
filesystem. Because of the complexity of writing a working
filesystem, the assignment is divided into two parts. In this first
part, you will develop a lot of scaffolding and enough code for your
filesystem to do something testable. In the second part, you will
complete the filesystem.
You may write in any language that is supported on Wilkes and
that supports Fuse, but as
mentioned in class, I strongly recommend C or C++.
Your code will be tested on Wilkes and must compile and run
there.
The Assignment
Your assignment is to develop a FAT-like filesystem that supports the
following features:
- The general structure of the filesystem is similar to the
Microsoft FAT design (see "FAT Design") below.
- The filesystem supports the following operations at a
minimum:
getattr
, access
,
readdir
, and mkdir
. (Note that at
this point it is not necessary to support
file I/O, or even files.)
- Your
mkdir
operation must allocate space from
the free list.
- The filesystem is backed by a SINGLE preallocated 10-MB file with a
fixed name, such as "fat_disk". The size of the file should
be a
#defined
constant, of course. Remember to
watch out for the working-directory
gotcha.
- When the filesystem is invoked, if the backing file doesn't
exist, it is created and initialized. However, if it
does exist, it should be attached and its previous
contents should be visible.
- When a mutating operation occurs, its effects must be
immediately visible in the backing file. (This means that you
can't do everything in memory and then wait until exit time to
write things out. I will test this feature by killing your
process with
SIGKILL
. O_DSYNC
isn't
necessary, because the operating system will make sure your
data gets to stable storage unless the entire OS crashes—which
isn't part of the testing plan!)
- Subdirectories must be supported.
- Your directories may be fixed-size; it is not necessary to be
able to create an arbitrary number of entries in a directory.
- Directory entries may also be fixed-size, as long as the name
length is moderately reasonable. (Nothing under 16 characters
is "reasonable" in my book; my minimal implementation
compromises with a limit of 32.)
- If you choose, file sizes may be limited to either 232 or
264 bytes.
- The acid test of your filesystem should be that is possible
to create directories, list them (with
ls -la
returning reasonable results including "." and ".."), and
cd
into them.
- Other operations are up to you. We will be extending the
filesystem to support files,
rmdir
, etc. in the
next assignment, so you are welcome to implement those
things. However, they will not be tested in the current
assignment.
Why this particular set of features? It's the minimum necessary to
have a filesystem where you can do something visible: create and list
directories. You'll find that you need to create quite a bit of
scaffolding to get that far (in particular, the code that creates an
initialized FAT filesystem from scratch).
When I refer to a "FAT-like" filesystem, I mean the following:
- Allocation is managed by an in-core table with one entry per
filesystem block. Each entry contains either 0 or the number
of another block. In toto, the table constitutes a
set of linked lists of blocks.
- The free list is a linked list (held in the in-core table)
reached from the superblock. (An alternative would be to use
the awful Microsoft FAT design, which marks free blocks with a
special code and requires scanning the FAT to
find free blocks.)
- The on-disk copy of the block table is read at mount
(filesystem initialization)
time and is updated at your discretion (but note that your
process might be killed at any time).
- All file metadata is kept in the directory entry. At a
minimum, this should include the file type (directory or
file), the size in bytes, the name, and the number of the
first block. (Subsequent blocks are located via the block
table.) Other metadata, such as ownership, permissions, and
timestamps, are up to you but are not required.
- The block size is up to you, but it must be at least 512. (I
recommend 4096, just to keep up with the modern world.)
- Like any other file system, the on-disk data structures are
stored in a single file (pseudo-disk) and are
kept in binary. That means that if you choose to ignore my
advice and write in a scripting language, you MAY
NOT store things on-disk in any form that is
essentially text-based, such as JSON. (Of course, storing
filenames as text is permissible and encouraged.) Also, your
on-disk format must be designed by you specifically for your
file system, and you must be able to describe it in sufficient
detail that I could write a C program to decode it. (For
example, Python's pickle formats are not acceptable.)
For reference, my minimal implementation used a block size of 512
bytes (it was a while ago), had six fields in the superblock
(including a magic number),
and had four fields in the directory entry. To make it easy to store the
superblock in a filesystem block, I used the following union:
union {
struct fat_superblock s;
char pad[512];
}
superblock;
(Note that the superblock should be only 512 bytes, even if you use a
different block size for your filesystem. That design makes it
possible to read the superblock without knowing the block size, which
is a useful feature. If the filesystem uses blocks larger than 512
bytes, the remaining space in the larger "block" is simply wasted.)
I also found it useful to create a few macros to do things like
seeking to a particular block, converting back and forth between byte
offsets and block numbers, etc.
Note: You are supposed to be writing a real
filesystem. The only differences from a true implementation of FAT
should be:
- It is backed by a plain file in the filesystem, rather than an
actual disk, and
- Your data structures are not required to be compatible with
other FAT implementations (i.e., you don't have to be able to
create or mount
MS-DOS FAT disks).
In particular, this means not taking easy shortcuts. Like any
filesystem, your implementation must satisfy the following criteria:
- All access to the "disk" must be in multiples of the block
size, which must be a power of 2 and must be 512 or greater.
- Changes to files and directories must be reflected on disk
immediately. No fair saving things in memory and then writing
them out when you unmount.
- Information must persist in the backing store after unmount.
You may also find it wise to review the requirements of Part 2 of this assignment to make sure you
don't make a design decision that will back you into a corner.
Submission
Submit your code (it should be a single file) as assignment 2 with
cs137submit
. If you implement any additional features,
describe them prominently in comments at the top of the file.
© 2018, Geoff Kuenning
This page is maintained by Geoff
Kuenning.