===================================================== An Introduction to the Linux Kernel Block I/O Stack ===================================================== :Conference: Chemnitzer Linux-Tage 2021 :Author: Benjamin Block :Organization: IBM Deutschland Research & Development GmbH :Date: 2021-01-07 15:48:22 +0100 :Version: 1 Extended Abstract / Short Paper =============================== Mass storage hardware which is used in computers today comes in a big variety of forms. From small flash cards or drives, over SSDs and HDDs, to network-attached storage servers; and everything is connected via different interfaces - e.g.: NVMe, SATA, SCSI. Usually this hardware represents the available storage space to operating systems as a linear address-space of equal-sized blocks, where each block can typically be addressed individually for I/O operations - referred to as "Block I/O". Typical block-sizes range from 512 to 4096 bytes. If this commonality in access method would be implemented individually in a custom storage-stack for each of these devices and/or interfaces, this would result in a lot of code repetition. Instead, Linux utilizes this commonality, and hides away the device- and interface-specifics from the storage users, using a common software layer: the Linux "Block Layer". The hardware is handled in a diverse set of device-drivers and yet other layers of abstraction, but users, such as file systems, paging, or userspace applications, can program against a common interface, and don't need to care about the hardware. This talk gives an introduction to this software stack and how block I/O works in Linux. This also includes some of the big changes that were made in recent years to transform the block layer to better deal with new, and faster storage devices. The focus is on the block layer, and less so on the device- and interface-specifics, nor on the virtual file system layer. The first part of the talk gives an overview over the components involved in and around the block layer, how they connect together, and what their purpose is in the stack. This covers some of the important data structures that are used to represent devices, partitions, queues, and those that are used to represent in-flight I/O. The second part explores how I/O requests traverse these components from userspace to hardware. How a request is generated by an application, different paths of how it may enter the block layer, how it is handled in it, until it is finally handed over to a device-driver to be serviced by a piece of hardware. Submitting I/O to the Linux kernel doesn't necessarily mean it is directly processed by a device-driver. Apart from file systems and caching, I/O often also passes through intermediate/mapped block devices that offer additional logic to simple storage - such as RAID, multipathing, or integrity-verification. To better illustrate this theory, it will be accompanied with a non-trivial real-world setup. On IBM Z applications and operating systems primarily access mass storage via a SAN, based on Fibre Channel technology. To access a volume on a storage target, we use multiple independent paths to the same FCP-attached SCSI disk, which are then composed in Linux to a single block device via ``dm-multipath``. This allows failure tolerant access to remote storage, while behaving to the user like a local disk would do. All thanks to the block layer. More Links ========== - Linux Storage Stack Diagram (Linux Kernel 4.10): ``_ - A block layer introduction part 1: the bio layer: ``_ - Block layer introduction part 2: the request layer: ``_ - Two new block I/O schedulers for 4.12: ``_ - "Linux Kernel Labs" on Block Device Drivers: ``_ - Matias Bjørling, Jens Axboe, David Nellans, and Philippe Bonnet. 2013. Linux block IO: introducing multi-queue SSD access on multi-core systems. - Kiyoshi Ueda, Jun'ichi Nomura, Mike Christie. 2007. Request-based Device-mapper multipath and Dynamic load balancing.