Initrd and Initramfs
Initrd and Initramfs
The Linux® initial RAM disk (initrd) is a temporary root file system that is mounted during system boot to support the two-state boot process. The initrd contains various executables and drivers that permit the real root file system to be mounted, after which the initrd RAM disk is unmounted and its memory freed. In many embedded Linux systems, the initrd is the final root file system. This article explores the initial RAM disk for Linux 2.6, including its creation and use in the Linux kernel.
What’s an initial RAM disk?
The initial RAM disk (initrd) is an initial root file system that is mounted prior to when the real root file system is available. The initrd is bound to the kernel and loaded as part of the kernel boot procedure. The kernel then mounts this initrd as part of the two-stage boot process to load the modules to make the real file systems available and get at the real root file system.
The initrd contains a minimal set of directories and executables to achieve this, such as the
insmod tool to install kernel modules into the kernel.
In the case of desktop or server Linux systems, the initrd is a transient file system. Its lifetime is short, only serving as a bridge to the real root file system. In embedded systems with no mutable storage, the initrd is the permanent root file system. This article explores both of these contexts.
Anatomy of the initrd
The initrd image contains the necessary executables and system files to support the second-stage boot of a Linux system.
Depending on which version of Linux you’re running, the method for creating the initial RAM disk can vary. Prior to Fedora Core 3, the initrd is constructed using the loop device. The loop device is a device driver that allows you to mount a file as a block device and then interpret the file system it represents. The loop device may not be present in your kernel, but you can enable it through the kernel’s configuration tool (
make menuconfig) by selecting Device Drivers > Block Devices > Loopback Device Support. You can inspect the loop device as follows (your initrd file name will vary):
Listing 1. Inspecting the initrd (prior to FC3)
# mkdir temp ; cd temp
# cp /boot/initrd.img.gz .
# gunzip initrd.img.gz
# mount -t ext -o loop initrd.img /mnt/initrd
# ls -la /mnt/initrd
You can now inspect the /mnt/initrd subdirectory for the contents of the initrd. Note that even if your initrd image file does not end with the .gz suffix, it’s a compressed file, and you can add the .gz suffix to gunzip it.
Beginning with Fedora Core 3, the default initrd image is a compressed cpio archive file. Instead of mounting the file as a compressed image using the loop device, you can use a cpio archive. To inspect the contents of a cpio archive, use the following commands:
Listing 2. Inspecting the initrd (FC3 and later)
# mkdir temp ; cd temp
# cp /boot/initrd-126.96.36.199.img initrd-188.8.131.52.img.gz
# gunzip initrd-184.108.40.206.img.gz
# cpio -i --make-directories < initrd-220.127.116.11.img
The result is a small root file system, as shown in Listing 3. The small, but necessary, set of applications are present in the ./bin directory, including
nash (not a shell, a script interpreter),
insmod for loading kernel modules, and
lvm (logical volume manager tools).
Listing 3. Default Linux initrd directory structure
# ls -la
drwxr-xr-x 10 root root 4096 May 7 02:48 .
drwxr-x--- 15 root root 4096 May 7 00:54 ..
drwxr-xr-x 2 root root 4096 May 7 02:48 bin
drwxr-xr-x 2 root root 4096 May 7 02:48 dev
drwxr-xr-x 4 root root 4096 May 7 02:48 etc
-rwxr-xr-x 1 root root 812 May 7 02:48 init
-rw-r--r-- 1 root root 1723392 May 7 02:45 initrd-18.104.22.168.img
drwxr-xr-x 2 root root 4096 May 7 02:48 lib
drwxr-xr-x 2 root root 4096 May 7 02:48 loopfs
drwxr-xr-x 2 root root 4096 May 7 02:48 proc
lrwxrwxrwx 1 root root 3 May 7 02:48 sbin -> bin
drwxr-xr-x 2 root root 4096 May 7 02:48 sys
drwxr-xr-x 2 root root 4096 May 7 02:48 sysroot
Of interest in Listing 3 is the init file at the root. This file, like the traditional Linux boot process, is invoked when the initrd image is decompressed into the RAM disk. We’ll explore this later in the article.
Tools for creating an initrd
Let’s now go back to the beginning to formally understand how the initrd image is constructed in the first place. For a traditional Linux system, the initrd image is created during the Linux build process. Numerous tools, such as
mkinitrd, can be used to automatically build an initrd with the necessary libraries and modules for bridging to the real root file system. The
mkinitrd utility is actually a shell script, so you can see exactly how it achieves its result. There’s also the
YAIRD (Yet Another Mkinitrd) utility, which permits customization of every aspect of the initrd construction.
Manually building a custom initial RAM disk
To create an (initially empty) initrd use the following steps:
Note: Change RDSIZE to your required filesystem size. E.g. with RDSIZE =8192 you will get a 8MB ramdisk. BLKSIZE=1024
host > dd if=/dev/zero of=<filename> bs=<BLKSIZE> count=<RDSIZE>
host > mke2fs -vm0 <filename> < RDSIZE >
host > tune2fs -c 0 <filename>
host > dd if=<filename> b=<BLKSIZE> count=< RDSIZE > | gzip -v9 > ramdisk.gz (or)
host > dd if=/dev/zero of=<filename> bs=<BLKSIZE> count=<RDSIZE>
host > mke2fs -F -m 0 -b <BLKSIZE> <filename> < RDSIZE >| gzip -v9 > ramdisk.
Now, we have a (empty) gzipped ramdisk image with (extracted) size of < RDSIZE >.
To fill empty ramdisk created above with all files needed for ramdisk, mount the image and fill it. Content would be e.g. BusyBox and/or other applications and/or libraries.
host > mkdir mnt
host > gunzip ramdisk.gz
host > mount -o loop ramdisk mnt/
host > ... copy stuff you want to have in ramdisk to mnt...
host > umount mnt
host > gzip -v9 ramdisk
The resulting ramdisk.gz is now ready for usage. Note its size is smaller than <count> cause of compression.
Note: Don’t forget to create/copy some basic /dev/xxx nodes to ramdisk.
Note: If BusyBox or applications in ramdisk are linked dynamically, don’t forget to copy dynamic libraries (*.so) to ramdisk (to correct directory) as well.
Creating a utility to generate ramdisk
Because there is no hard drive in many embedded systems based on Linux, the initrd also serves as the permanent root file system. Listing 4 shows how to create an initrd image. I’m using a standard Linux desktop so you can follow along without an embedded target. Other than cross-compilation, the concepts (as they apply to initrd construction) are the same for an embedded target.
Listing 4. Utility (mkird) to create a custom initrd
rm -f /tmp/ramdisk.img
rm -f /tmp/ramdisk.img.gz
# Ramdisk Constants
# Create an empty ramdisk image
dd if=/dev/zero of=/tmp/ramdisk.img bs=$BLKSIZE count=$RDSIZE
# Make it an ext2 mountable file system
/sbin/mke2fs -F -m 0 -b $BLKSIZE /tmp/ramdisk.img $RDSIZE
# Mount it so that we can populate
mount /tmp/ramdisk.img /mnt/initrd -t ext2 -o loop=/dev/loop0
# Populate the filesystem (subdirectories)
# Grab busybox and create the symbolic links
cp /usr/local/src/busybox-1.1.1/busybox .
ln -s busybox ash
ln -s busybox mount
ln -s busybox echo
ln -s busybox ls
ln -s busybox cat
ln -s busybox ps
ln -s busybox dmesg
ln -s busybox sysctl
# Grab the necessary dev files
cp -a /dev/console /mnt/initrd/dev
cp -a /dev/ramdisk /mnt/initrd/dev
cp -a /dev/ram0 /mnt/initrd/dev
cp -a /dev/null /mnt/initrd/dev
cp -a /dev/tty1 /mnt/initrd/dev
cp -a /dev/tty2 /mnt/initrd/dev
# Equate sbin with bin
ln -s bin sbin
# Create the init file
cat >> /mnt/initrd/linuxrc << EOF
echo “Simple initrd is active”
mount -t proc /proc /proc
mount -t sysfs none /sys
chmod +x /mnt/initrd/linuxrc
# Finish up…
To create an initrd, begin by creating an empty file, using
/dev/zero (a stream of zeroes) as input writing to the ramdisk.img file. The resulting file is 4MB in size (4000 1K blocks). Then use the
mke2fs command to create an ext2 (second extended) file system using the empty file. Now that this file is an ext2 file system, mount the file to /mnt/initrd using the loop device. At the mount point, you now have a directory that represents an ext2 file system that you can populate for your initrd. Much of the rest of the script provides this functionality.
The next step is creating the necessary subdirectories that make up your root file system: /bin, /sys, /dev, and /proc. Only a handful are needed (for example, no libraries are present), but they contain quite a bit of functionality.
To make your root file system useful, use BusyBox. This utility is a single image that contains many individual utilities commonly found in Linux systems (such as ash, awk, sed, insmod, and so on). The advantage of BusyBox is that it packs many utilities into one while sharing their common elements, resulting in a much smaller image. This is ideal for embedded systems. Copy the BusyBox image from its source directory into your root in the /bin directory. A number of symbolic links are then created that all point to the BusyBox utility. BusyBox figures out which utility was invoked and performs that functionality. A small set of links are created in this directory to support your init script (with each command link pointing to BusyBox).
The next step is the creation of a small number of special device files. I copy these directly from my current /dev subdirectory, using the
-a option (archive) to preserve their attributes.
The penultimate step is to generate the linuxrc file. After the kernel mounts the RAM disk, it searches for an
init file to execute. If an
init file is not found, the kernel invokes the linuxrc file as its startup script. You do the basic setup of the environment in this file, such as mounting the /proc file system. In addition to /proc, I also mount the /sys file system and emit a message to the console. Finally, I invoke
ash (a Bourne Shell clone) so I can interact with the root file system. The linuxrc file is then made executable using
Finally, your root file system is complete. It’s unmounted and then compressed using
Using opensource utility to generate ramdisk
genext2fs -b <RDSIZE> -d src -f dev.txt flashdisk.img
This example builds a filesystem from all the files in src , then device nodes are created based on the contents of the device file dev.txt.
An example device file follows:
drwx /dev crw- 10,190 /dev/lcd brw- 1,0 /dev/ram0
This device list builds the /dev directory, a character device node /dev/lcd (major 10, minor 190) and a block device node /dev/ram0 (major 1, minor 0)
To make initrd work, you have to configure kernel properly:
# General setup
# UBI – Unsorted block images
Note: The ramdisk size e.g. 8192 above has to be configured for your individual setup.
To use initramfs a cpio archive is embedded directly into the kernel. I.e. you don’t create an additional (ramdisk) image. Instead, the initial file system is directly incorporated into the kernel. With this, the kernel size increases by the file system size. It’s like you embed above ramdisk directly into the kernel.
Cause initramfs is directly embedded in the the kernel, its creation is simpler. No dd & mount & gzip stuff like with ramdisk above. You simply have to fill a directory on your host with the target filesystem you like and then pass the path to this directory to the kernel build process.
Create target file system
host > mkdir target_fs
host > ... copy stuff you want to have in initramfs to target_fs...
Note: cpio system used for initramfs can’t handle hard links. If you e.g. created your BusyBox using hard links, you will get a quite large initramfs cause each command is taken with its size and not as hard link. In cpio initramfs use symbolic/soft links instead.
Note: To be able to detect initramfs by kernel properly, the top level directory has to contain a program called init. This can be done by e.g. using a soft link from top level init to /bin/busybox
/init -> /bin/busybox
if you use BusyBox in your initramfs.
The only difference from creating an initrd is to give the kernel the path to the target file system you like to embed:
# # General setup # ... CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="<path_to>/target_fs>"...
# # UBI - Unsorted block images # ... CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=1 CONFIG_BLK_DEV_RAM_SIZE=8192 CONFIG_BLK_DEV_RAM_BLOCKSIZE=1024 ...
As we know, to run threads, we need to schedule them. In order to run them effectively, they need to be synchronized. Suppose one thread creates a brush and then creates several threads that share the brush and draw with it. The first thread must not destroy the brush until the other threads finish drawing. This requires a means of coordinating the sequence of actions in several threads.
One way is to create a global Boolean variable that one thread uses to signal another. The writing thread will set this parameter to TRUE and the reading thread might loop until it sees the flag change. This will definitely work, but the looping thread wastes a lot of processor time.
Instead, Win32 supports a set of synchronization objects such as mutexes, semaphores, events and critical sections. These are the system objects created by the object manager. All of them will work in a similar way. A thread that wants to perform some coordinated action waits for a response from one of these objects and proceeds only after receiving it. The scheduler removes the waiting objects from the dispatch queue so that they will not consume processor time
It is important to keep in mind that:
• A mutex object works like a narrow gate for one thread to pass at a time.
• A semaphore object work like a multi-lane gate that a limited number of threads can pass through together.
• An event object broadcasts a public signal for any listening thread to hear.
• A critical section object works like a mutex but only within a single process.
Mutexes, semaphores and events can coordinate threads in different processes, but critical sections are only visible to threads in a single process.
Mutexex: These are very much like critical sections except that they can be used to synchronize data across multiple processes. To do this, a thread in each process must have its own process-relative handle to a single mutex object.
Semaphores: These objects are used for resource counting. They offer a thread the ability to query the number of resources available; if one or more resources are available, the count of available resources is decremented. Thus semaphores perform the test and set operations automatically, that is, when you request a resource from a semaphore, the operating system checks whether the resource is available and decrements the count of the available resources without letting another thread interfere. Only after the resource count has been decremented does the system allow another thread to request a resource.
For example, let us say that a computer has three serial ports. No more than three threads can use the serial ports at any given time; each port can be assigned to one thread. This situation provides a perfect opportunity to use a semaphore. To monitor serial port usage, you can create a semaphore with a count of three – one for each port. A semaphore is signaled when its resource count is greater than zero and is non-signaled when the count is zero.
Because several threads can affect a semaphore’s resource count, a semaphore, unlike a critical section or mutex, is not considered to be owned by a thread. This means that it is possible to have one thread wait for the semaphore object and another thread release the object.
Events: Even objects are the most primitive form of synchronization objects and they are quite different from mutexes and semaphores. Mutexes and semaphores are usually used to control access to data, but events are used to signal that some operation has been completed.
There are two different types of event objects – manual reset events and auto reset events. A manual reset event is used to signal several threads simultaneously to say that an operation has finished, and an auto reset event is used to signal a single thread to say that an operation has been completed.
Events are most commonly used when one thread performs initialization work and, when it finishes, signals another thread to perform the remaining work. The initialization thread sets the event to the non-signaled state and begins to perform the initialization. Then, after the initialization has been completed, the thread sets the event to the signaled state. Waiting for the event, the worker thread wakes up and performs the rest of the work.
For example, a process might be running two threads. The first thread reads data from a file into a memory buffer. After the data has been read, the first thread signals the second thread that it can process the data. When the second thread finishes processing the data, it might need to signal the first thread again, so that the first thread can read the next block of data from the file.
Critical sections: A critical section is a small section of the code that required exclusive access to some shared data before the code can execute. Of all synchronization objects, critical sections are the simplest to use, but they can be used to synchronize threads only within a single process. Critical sections allow only one thread at a time to gain access to a region of data.
Mutex Vs semaphore
A semaphore can be a Mutex but a Mutex can never be semaphore. This simply means that a binary semaphore can be used as Mutex, but a Mutex can never exhibit the functionality of semaphore.Both semaphores and Mutex (at least the on latest kernel) are non-recursive in nature.No one owns semaphores, whereas Mutex are owned and the owner is held responsible for them. This is an important distinction from a debugging perspective.
In case the of Mutex, the thread that owns the Mutex is responsible for freeing it. However, in the case of semaphores, this condition is not required. Any other thread can signal to free the semaphore by using the sem_post() function.
A Mutex, by definition, is used to serialize access to a section of re-entrant code that cannot be executed concurrently by more than one thread. A semaphore, by definition, restricts the number of simultaneous users of a shared resource up to a maximum number
Another difference that would matter to developers is that semaphores are system-wide and remain in the form of files on the filesystem, unless otherwise cleaned up. Mutex are process-wide and get cleaned up automatically when a process exits.
The nature of semaphores makes it possible to use them in synchronizing related and unrelated process, as well as between threads. Mutex can be used only in synchronizing between threads and at most between related processes (the pthread implementation of the latest kernel comes with a feature that allows Mutex to be used between related process).
According to the kernel documentation, Mutex are lighter when compared to semaphores. What this means is that a program with semaphore usage has a higher memory footprint when compared to a program having Mutex.
From a usage perspective, Mutex has simpler semantics when compared to semaphores.
mutex is like a lock. A thread can lock it, and then any subsequent attempt to lock it, by the same thread or any other, will cause the attempting thread to block until the
mutex is unlocked. These are very handy for keeping data structures correct from all the threads’ points of view. For example, imagine a very large linked list. If one thread deletes a node at the same time that another thread is trying to walk the list, it is possible for the walking thread to fall off the list, so to speak, if the node is deleted or changed. Using a
mutex to “lock” the list keeps this from happening.
Computer Scientist people will tell you that
Mutex stands for Mutual Exclusion.
In Java, Mutex-like behaviour is accomplished using the
Technically speaking, only the thread that locks a
mutex can unlock it, but sometimes operating systems will allow any thread to unlock it. Doing this is, of course, a Bad Idea. If you need this kind of functionality, read on about the
semaphore in the next paragraph.
Similar to the
mutex is the
semaphore is like a
mutex that counts instead of locks. If it reaches zero, the next attempt to access the semaphore will block until someone else increases it. This is useful for resource management when there is more than one resource, or if two separate threads are using the same resource in coordination. Common terminology for using semaphores is “uping” and “downing”, where upping increases the count and downing decreases and blocks on zero.
Java provides a Class called
Semaphore which does the same thing, but uses
release() methods instead of uping and downing.
With a name as cool-sounding as
semaphore, even Computer Scientists couldn’t think up what this is short for.
semaphores are designed to allow multiple threads to up and down them all at once. If you create a
semaphore with a count of 1, it will act just like a
mutex, with the ability to allow other threads to unlock it.
Computer Scientists like to refer to the pieces of code protected by
semaphores as Critical Sections. In general, it’s a good idea to keep Critical Sections as short as possible to allow the application to be as paralle as possible.