Booting

The Unified Extensible Firmware Interface(UEFI) is a modern replacement for the traditional BIOS: it’s an interface between the system firmware and the OS, enhancing the boot process.

When the system powers on:

  • UEFI, written in C, modular and runs on various platforms quickly takes control to initialize system hardware and load firmware settings into RAM.
  • UEFI uses a dedicated FAT32 partition, known as the EFI System Partition (ESP), to store bootloader (startup) files for various operating systems.

UEFI uses GPT (GUID Partition Table) overcomes the size limitations of BIOS and allows more flexible partitioning​​ (supports disks larger than 2.2TB and can handle up to 9.4 zettabytes with 64-bit logical block addressing).

During machine boot-up, the firmware is loaded into main memory, performs necessary checks and launch the bootloader which mounts the filesystem and loads the kernel image.

During these system checks, it’s also validated the signature of the bootloader. This is called secure boot (which is a security feature of UEFI standard) and ensures that a device will only boot using software trusted by the (Original Equipment Manufacturer (OEM)). Its primary purpose is to protect the boot process from attacks such as rootkits and bootkits, by using a set of cryptographic keys to authenticate allowed software. These keys are stored in the TPM (hardware component).

In Linux, the init process is started at the end of the kernel boot. The init process has a PID of 1 and PPID of 0. It is the ancestor of all user-space processes and starts all enabled services during startup. The two common implementations of init are SysVinit and systemd. The last one is very popular:

  • It is responsible for starting/stopping services and managing their dependencies.
  • systemd provides a way to define system services using unit files.

Discoverability

When booting, the operating system (OS) needs to be aware of several things:

  1. The devices that are already present on the machine.
  2. How interrupts are managed.
  3. The number of processors available.

To address these needs, two standards have been developed to provide the kernel with all the necessary platform data:

  • Advanced Configuration and Power Interface (ACPI): This standard is primarily used on general-purpose platforms, particularly those utilizing Intel processors.
  • Device trees: This standard is mainly used on embedded platforms. Device trees aid in discoverability and provide information about the platform’s configuration.

ACPI: Advanced Configuration and Power Interface

ACPI helps with discoverability, power management, and thermal management. It was developed by Intel, Microsoft, and Toshiba in 1996. The key benefits of ACPI are:

  • It provides an open standard for operating systems to discover and configure computer hardware components.
  • It enables power management features such as putting unused hardware components to sleep.
  • It supports auto-configuration, including Plug and Play and hot swapping.
  • It facilitates status monitoring.

There is no need to include platform-specific code and a separate binary kernel for each platform.

Power managament using Device states

Power management refers to the various tools and techniques used to consume the minimum amount of power necessary based on the system state, configuration, and use case.

The power management in modern computing is a critical challenge linked to the Dennard scaling:

  • Named after Robert Dennard
  • Observed that smaller transistors keep power density constant
  • Enabled higher frequency without significant power increase

Breakdown of Dennard scaling:

  • Occurred around mid-2000s

  • Issues controlling leakage current and other non-ideal effects

  • Shrinking transistors no longer proportionately reduced power usage

  • Power management in modern computing:

    • Critical challenge linked to Dennard scaling
  • Dennard scaling:

    • Named after Robert Dennard
    • Observed that smaller transistors keep power density constant
    • Enabled higher frequency without significant power increase
  • Breakdown of Dennard scaling:

    • Occurred around mid-2000s
    • Issues controlling leakage current and other non-ideal effects
    • Shrinking transistors no longer proportionately reduced power usage

This led to the concept of the “power wall,” where the industry shifted towards multicore processors to improve performance. In this context, addressing the power management problem is challenging to do mathematically. The primary source of power consumption is related to the frequency and voltage of each system device. Key Questions:

  1. What voltage should be used for each device?
  2. What frequency should be used for each device?
  3. How does switching between different frequencies and voltages impact the system?

Power management is complex due to the interdependence of multiple variables. To simplify this, ACPI uses device states.

ACPI defines a hierarchical state machine to describe the power state of a system. There are five different power state types: G-type, S-type, C-type, P-type, and D-type.

  • G-type states group together system states (S-type) and provide a coarse-grained description of the system’s behavior. They can indicate whether the system is working, appearing off but able to wake up on specific events, or completely off.
  • S-type states, also known as system or sleep states, describe how the computer implements the corresponding G-type state. For example, the G1 state can be implemented in different ways, such as halting the CPU while keeping the RAM on, or completely powering off the CPU and RAM and moving memory content to disk.
  • C-type states, specific to the CPU (CPU State), refer to the power management capabilities of the CPU. These states allow the CPU to reduce its clock speed (C0) or even shut down completely when idle to save energy (C1-C3). C-states can only be entered in an S0 configuration (working state) and are usually invoked through a specific CPU instruction, such as MWAIT on Intel processors.
  • P-type states (Performance State) are a power management feature that allows the CPU to adjust its clock speed and voltage based on workload demands. P-states only make sense in the C0 configuration (working state).
  • D-type states (Device State) are the equivalent of C-states but for peripheral devices. They represent the power management capabilities of the devices.

Above all this system, at the end, the orchestra director (kernel) uses OSPM (Operating System-directed configuration and Power Management) to set the appropriate sleep state for a CPU when it’s idle, through ACPI.

An example: when the scheduler identifies that more performance is required, it communicates this desired change to the CPUFreq module in OSPM. CPUFreq then interacts with ACPI and the Performance Control Machine register is modified.

Also the user-space can interact with the kernel’s ACPI through an ACPI daemon, like acpid in Linux, which listens for ACPI events and executes predefined scripts in response.