When the _IO_READ handler is called, it may need to return data for either a file (if S_ISDIR (ocb->attr->mode) is false) or a directory (if S_ISDIR (ocb->attr->mode) is true). The readdir() function sets _IO_XTYPE_READDIR in the xtype member of the _IO_READ message. We've seen the algorithm for returning data, especially the method for matching the returned data's size to the smaller of the data available or the client's buffer size.
A similar constraint is in effect for returning directory data to a client, except we have the added issue of returning block-integral data. What this means is that instead of returning a stream of bytes, where we can arbitrarily package the data, we're actually returning a number of struct dirent structures. (In other words, we can't return 1.5 of those structures; we always have to return an integral number.) The dirent structures must be aligned on 4-byte boundaries in the reply.
A struct dirent looks like this:
struct dirent { #if _FILE_OFFSET_BITS - 0 == 64 ino_t d_ino; /* File serial number. */ off_t d_offset; #elif !defined(_FILE_OFFSET_BITS) || _FILE_OFFSET_BITS == 32 #if defined(__LITTLEENDIAN__) ino_t d_ino; /* File serial number. */ ino_t d_ino_hi; off_t d_offset; off_t d_offset_hi; #elif defined(__BIGENDIAN__) ino_t d_ino_hi; ino_t d_ino; /* File serial number. */ off_t d_offset_hi; off_t d_offset; #else #error endian not configured for system #endif #else #error _FILE_OFFSET_BITS value is unsupported #endif int16_t d_reclen; int16_t d_namelen; char d_name[1]; };
The d_ino member contains a mountpoint-unique file serial number. This serial number is often used in various disk-checking utilities for such operations as determining infinite-loop directory links. (Note that the inode value cannot be zero, which would indicate that the inode represents an unused entry.)
In some filesystems, the d_offset member is used to identify the directory entry itself; in others, it's the offset of the next directory entry. For a disk-based filesystem, this value might be the actual offset into the on-disk directory structure.
The d_reclen member contains the size of this directory entry and any other associated information (such as an optional struct stat structure appended to the struct dirent entry; see below).
The d_namelen parameter indicates the size of the d_name parameter, which holds the actual name of that directory entry. (Since the size is calculated using strlen(), the \0 string terminator, which must be present, is not counted.)
struct { struct dirent ent; char namebuf[NAME_MAX + 1 + offsetof(struct dirent, d_name) - sizeof( struct dirent)]; } entry
or as a union:
union { struct dirent ent; char filler[ offsetof( struct dirent, dname ) + NAME_MAX + 1]; } entry;
So in our io_read handler, we need to generate a number of struct dirent entries and return them to the client. If we have a cache of directory entries that we maintain in our resource manager, it's a simple matter to construct a set of IOVs to point to those entries. If we don't have a cache, then we must manually assemble the directory entries into a buffer and then return an IOV that points to that.