So following along in this command-line session, I'll show you the resulting .tar file:
# ls -la total 73 drwxrwxr-x 2 root root 4096 Aug 17 17:31 ./ drwxrwxrwt 4 root root 4096 Aug 17 17:29 ../ -rw-rw-r-- 1 root root 1076 Jan 14 2003 io_read.c -rw-rw-r-- 1 root root 814 Jan 12 2003 io_write.c -rw-rw-r-- 1 root root 6807 Feb 03 2003 main.c -rw-rw-r-- 1 root root 11883 Feb 03 2003 tarfs.c -rw-rw-r-- 1 root root 683 Jan 12 2003 tarfs.h -rw-rw-r-- 1 root root 6008 Jan 15 2003 tarfs_io_read.c # tar cvf x.tar * io_read.c io_write.c main.c tarfs.c tarfs.h tarfs_io_read.c # ls -l x.tar -rw-rw-r-- 1 root root 40960 Aug 17 17:31 x.tar
Here I've taken some of the source files in a directory and created a .tar file (called x.tar) that ends up being 40960 bytes—a nice multiple of 512 bytes, as we'd expect.
Each of the files is prefixed by a header in the .tar file, followed by the file content, aligned to a 512-byte boundary.
This is what each header looks like:
Offset | Length | Field Name |
---|---|---|
0 | 100 | name |
100 | 8 | mode |
108 | 8 | uid |
116 | 8 | gid |
124 | 12 | size |
136 | 12 | mtime |
148 | 8 | chksum |
156 | 1 | typeflag |
157 | 100 | linkname |
257 | 6 | magic |
263 | 2 | version |
265 | 32 | uname |
297 | 32 | gname |
329 | 8 | devmajor |
337 | 8 | devminor |
345 | 155 | prefix |
500 | 11 | filler |
Here's a description of the fields that we're interested in for the filesystem (all fields are ASCII octal unless noted otherwise):
We've skipped a bunch of fields, such as the checksum, because we don't need them for our filesystem. (For the checksum, for example, we're simply assuming that the file has been stored properly—in the vast majority of cases, it's not actually on an antique 9-track tape—so data integrity shouldn't be a problem!)
What I meant above by ASCII octal fields is that the value of the number is encoded as a sequence of ASCII digits in base 8. Really.
For example, here's the very first header in the sample x.tar that we created above (addresses on the left-hand side, as well as the dump contents, are in hexadecimal, with printable ASCII characters on the right-hand side):
0000000 69 6f 5f 72 65 61 64 2e 63 00 00 00 00 00 00 00 io_read.c....... 0000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000060 00 00 00 00 30 31 30 30 36 36 34 00 30 30 30 30 ....0100664.0000 0000070 30 30 30 00 30 30 30 30 30 30 30 00 30 30 30 30 000.0000000.0000 0000080 30 30 30 32 30 36 34 00 30 37 36 31 31 31 34 31 0002064.07611141 0000090 34 36 35 00 30 31 31 33 33 34 00 20 30 00 00 00 465.011334..0... 00000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000100 00 75 73 74 61 72 20 20 00 72 6f 6f 74 00 00 00 .ustar...root... 0000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000120 00 00 00 00 00 00 00 00 00 72 6f 6f 74 00 00 00 .........root... 0000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00001A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00001B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00001C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00001D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00001E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00001F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Here are the fields that we're interested in for our .tar filesystem:
The one interesting wrinkle has to do with items that are in subdirectories.
Depending on how you invoked tar when the archive was created, you may or may not have the directories listed individually within the .tar file. What I mean is that if you add the file dir/spud.txt to the archive, the question is, is there a tar header corresponding to dir? In some cases there will be, in others there won't, so our .tar filesystem will need to be smart enough to create any intermediate directories that aren't explicitly mentioned in the headers.
Note that in all cases the full pathname is given; that is, we will always have dir/spud.txt. We never have a header for the directory for dir followed by a header for the file spud.txt; we'll always get the full path to each and every component listed in a .tar file.
Let's stop and think about how this resource manager compares to the RAM-disk resource manager in the previous chapter. If you squint your eyes a little bit, and ignore a few minor details, you can say they are almost identical. We need to:
The only thing that's really different is that instead of storing the file contents in RAM, we're storing them on disk! (The fact that this is a read-only filesystem isn't really a difference, it's a subset.)