At the highest level, the analyze_tar_function() function opens the .tar file, processes each file inside by calling add_tar_entry(), and then closes the .tar file. There's a wonderful library called zlib, which lets us open even compressed files and pretend that they are just normal, uncompressed files. That's what gives us the flexibility to open either a .tar or a .tar.gz file with no additional work on our part. (The limitation of the library is that seeking may be slow, because decompression may need to occur.)
int analyze_tar_file (cfs_attr_t *a, char *fname) { gzFile fd; off_t off; ustar_t t; int size; int sts; char *f; // 1) the .tar (or .tar.gz) file must exist :-) if ((fd = gzopen (fname, "r")) == NULL) { return (errno); } off = 0; f = strdup (fname); // 2) read the 512-byte header into "t" while (gzread (fd, &t, sizeof (t)) > 0 && *t.name) { dump_tar_header (off, &t); // 3) get the size sscanf (t.size, "%o", &size); off += sizeof (t); // 4) add this entry to the database if (sts = add_tar_entry (a, off, &t, f)) { gzclose (fd); return (sts); } // 5) skip the data for the entry off += ((size + 511) / 512) * 512; gzseek (fd, off, SEEK_SET); } gzclose (fd); return (EOK); }
The code walkthrough is:
In step 5 we skip the file content. I'm surprised that not all of today's tar utilities do this when they're dealing with files—doing a tar tvf to get a listing of the tar file takes forever for huge files!