Embedded transaction filesystem (ETFS)

ETFS implements a high-reliability filesystem for use with embedded solid-state memory devices, particularly NAND flash memory.

The filesystem supports a fully hierarchical directory structure with POSIX semantics as shown in the table above.

ETFS is a filesystem composed entirely of transactions. Every write operation, whether of user data or filesystem metadata, consists of a transaction. A transaction either succeeds or is treated as if it never occurred.

Transactions never overwrite live data. A write in the middle of a file or a directory update always writes to a new unused area. In this way, if the operation fails part way through (due to a crash or power failure), the old data is still intact.

Some log-based filesystems also operate under the principle that live data is never overwritten. But ETFS takes this to the extreme by turning everything into a log of transactions. The filesystem hierarchy is built on the fly by processing the log of transactions in the device. This scan occurs at startup, but is designed such that only a small subset of the data is read and CRC-checked, resulting in faster startup times without sacrificing reliability.

Transactions are position-independent in the device and may occur in any order. You could read the transactions from one device and write them in a different order to another device. This is important because it allows bulk programming of devices containing bad blocks that may be at arbitrary locations.

This design is well-suited for NAND flash memory. NAND flash is shipped with factory-marked bad blocks that may occur in any location.

Figure 1. ETFS is a filesystem composed entirely of transactions.

Inside a transaction

Each transaction consists of a header followed by data. The header contains the following:

FID
A unique file ID that identifies which file the transaction belongs to.
Offset
The offset of the data portion within the file.
Size
The size of the data portion.
Sequence
A monotonically increasing number (to enable time ordering).
CRCs
Data integrity checks (for NAND, NOR, SRAM).
ECCs
Error correction (for NAND).
Other
Reserved for future expansion.

Types of storage media

Although best for NAND devices, ETFS also supports other types of embedded storage media by using driver classes as follows:

Class CRC ECC Wear-leveling erase Wear-leveling read Cluster size
NAND 512+16 Yes Yes Yes Yes 1 KB
NAND 2048+64 Yes Yes Yes Yes 2 KB
RAM No No No No 1 KB
SRAM Yes No No No 1 KB
NOR Yes No Yes No 1 KB
Note: Although ETFS can support NOR flash, we recommend instead the FFS3 filesystem (devf-*), which is designed explicitly for NOR flash devices.

Reliability features

ETFS is designed to survive across a power failure, even during an active flash write or block erase. The following features contribute to its reliability:

Dynamic wear-leveling
Flash memory allows a limited number of erase cycles on a flash block before the block will fail. This number can be as low as 100,000. ETFS tracks the number of erases on each block. When selecting a block to use, ETFS attempts to spread the erase cycles evenly over the device, dramatically increasing its life. The difference can be extreme: from usage scenarios of failure within a few days without wear-leveling to over 40 years with wear-leveling.
Static wear-leveling
Filesystems often consist of a large number of static files that are read but not written. These files will occupy flash blocks that have no reason to be erased. If the majority of the files in flash are static, this will cause the remaining blocks containing dynamic data to wear at a dramatically increased rate.

ETFS notices these underworked static blocks and forces them into service by copying their data to an overworked block. This solves two problems: it gives the overworked block a rest, since it now contains static data, and it forces the underworked static block into the dynamic pool of blocks.

CRC error detection
Each transaction is protected by a cyclic redundancy check (CRC). This ensures quick detection of corrupted data, and forms the basis for the rollback operation of damaged or incomplete transactions at startup. The CRC can detect multiple bit errors that may occur during a power failure.
ECC error correction
On a CRC error, ETFS can apply error correction coding (ECC) to attempt to recover the data. This is suitable for NAND flash memory, in which single-bit errors may occur during normal usage. An ECC error is a warning signal that the flash block the error occurred in may be getting weak, i.e., losing charge.

ETFS marks the weak block for a refresh operation, which copies the data to a new flash block and erases the weak block. The erasure recharges the flash block.

Read degradation monitoring with automatic refresh
Each read operation within a NAND flash block weakens the charge maintaining the data bits. Most devices support about 100,000 reads before there's danger of losing a bit. The ECC recovers a single-bit error, but may not be able to recover multi-bit errors.

ETFS solves this by tracking reads and marking blocks for refresh before the 100,000 read limit is reached.

Transaction rollback
When ETFS starts, it processes all transactions and rolls back (discards) the last partial or damaged transaction. The rollback code is designed to handle a power failure during a rollback operation, thus allowing the system to recover from multiple nested faults. The validity of a transaction is protected by CRC codes on each transaction.
Atomic file operations
ETFS implements a very simple directory structure on the device, allowing significant modifications with a single flash write. For example, the move of a file or directory to another directory is often a multistage operation in most filesystems. In ETFS, a move is accomplished with a single flash write.
Automatic file defragmentation
Log-based filesystems often suffer from fragmentation, since each update or write to an existing file causes a new transaction to be created. ETFS uses write-buffering to combine small writes into larger write transactions in an attempt to minimize fragmentation caused by lots of very small transactions. ETFS also monitors the fragmentation level of each file and will do a background defragmenting operation on files that do become badly fragmented. Note that this background activity will always be preempted by a user data request in order to ensure immediate access to the file being defragmented.