The World of File Systems (Part 1)

by wmalik

I have been reading lots of interesting papers lately to improve my overall understanding of file systems and thought it would be nice to write about a few important concepts.

 

Filesystem

A filesystem basically lets you read, write and update data on a storage device. It mostly provides the abstraction of files and directories. No matter how boring filesystems seem to an average user, they solve some interesting problems. One important thing to mention here is that there is no such thing as the-best-file-system, because there are various types of storage devices (disks, SSDs, tape drives, flash drives etc.), and each of them has different characteristics. Furthermore, there are different types of user applications, which have different types of access patterns (think read intensive, write intensive etc), and it becomes difficult (or impossible) to design a filesystem which works well in all scenarios. So, filesystems are usually designed for specific storage devices and application access patterns.

File systems are typically categorized into two types: Disk file systems (or Local file systems as I like to call them), and Distributed file systems.

The disk file systems are further divided into many categories, out of which Extent file systems and Log Structured file systems are the most interesting.

 

Extent File Systems

These file systems are much more efficient than conventional file systems in block allocation and reads/writes because they make use of something called extents.

To understand extents, let’s see how old file systems (FAT, ext2, ext3 etc.) work. These file systems split a file into blocks (usually 4KB) and store metadata for each and every block. So for a 128MB file, 32 blocks will be allocated (each having its own metadata structure), and no attempt will be made to save these blocks in a contiguous area. So writes will be pretty slow, because block allocations will need to be done as the file is being written. Furthermore, reads will also be slow because of high “seek” times. In addition to these problems, the file system will also become fragmented over time.

An extent file system allocates extents for a file rather than single blocks.  An extent is a contiguous area of storage in a file system. So instead of allocating a block one by one, the file system allocates a contiguous sequence of blocks in one go. This not only improves the read/write performance (because seek time is reduced), but also helps in preventing the file system from becoming fragmented. Of course, the file system can still become fragmented because it is not always possible to allocate blocks contiguously, but the situation is much better than old file systems. Another big benefit of using extents is that very little space is required to store the metadata of blocks on disk, because metadata is stored for each extent, and not for each block. Due to the above mentioned reasons, most of the modern file systems (ext4, Btrfs, XFS, Reiser4 etc.) make use of extents for block allocation. Obviously modern file systems do much more than allocating blocks, but nonetheless it is one of the most fundamental aspect in file system design.

That’s it for now. In the next posts, I will write about Log Structured file systems, and Distributed file systems. Yesterday, I gave a little overview of some popular file systems to my colleagues at SICS. Here are the slides.

If you are interested in knowing more about file systems, BUY MY BOOK. Just kidding.