Exokernel

Paper: Exokernel: an operating system architecture for application-level resource management [pdf][acm.org]

It is an approach to operating system that allows applications to directly manage the hardware resource as much as possible. So, it leaves traditional OS duties to the application and only facilitates multiplexing resources and checking ownership/capabilities (i.e. security). Applications with (or without) the help OS libraries that live in the user space, do

manage virtual memory,
handle interrupts, exceptions
handle network, I/O
coordinate scheduling
do IPC using simple yielding primitive/syscall provided by kernel

This offers tons of flexibility to the application and reduces the kernel code. Aegis is a kernel, and ExOS is an Operation System library collection that follows the design principles of Exokernel.

Conventional OS give only high-level core abstraction to application, Exokernel gives low level interface to machine resources. With high level abstraction, the OS is forced to make tradeoffs. But with exokernel, the application gets to decide how to best use the hardware resources that fits it purpose. As an example, think of database systems and how they can manage page faults, file caching and such to best fit the need of database.

Responsibility of Exokernel:

Tracking ownership of resources
Multiplexing resources
Performing access control
Revoking access to resources

Design principles:

The kernel shouldn't manage resources besides required for protection.
Allow application to allocate resources like memory, processor and devices.
Export privileged instruction to application through sycalls that check permissions. This includes updating TLB, interrupt handlers,… .
Hardware resources should be named using physical names (e.g. physical address for memory, …). This removes name translation in kernel and allows application to optimize usage for their needs based on hardware configuration.
The exokernel protects and guards applications from each other, but expects, for example, that an application does not allocate all of its quota of physical memory, if it only needs a couple of pages. i.e. applications aren't assumed to be malicious.

Multiplexing:

Physical Memory: Privileged instructions like TLB loads and DMA are checked by exokernel. For, speed a bigger per application TLB is maintained in memory called the Software TLB (STLB).

When TLB miss happens:
- The kernel checks if the entry is in Software TLB of the application, if so it fills the hardware TLB
- Else kernel forwards the TLB miss to the application, application sends new TLB entry to kernel
- Kernel checks the proposed entry, and adds to the TLB.
This allows application to manage its page table however it wants.
Network:
- For incoming message: Kernel uses protocol specific knowledge to transfer respective packets to applications.
- For outgoing message: Transmission buffers can be shared & protected (like physical memory) and applications can directly use those buffers
Processor/CPU Time Scheduling

CPU is exposed as a vector of time slices which can be allocated similar to memory. Processor scheduling is done in round robin fashion by cycling through vector of time slices. Applications demand time slices (scientific computing application may demand contiguous slices, while interactive application might ask for uniformly spaced short slices).

The when the time slice of different application comes (notified by a timer interrupt), the kernels ask/notifies (as an exception) the application to save its context for switching. For fairness, if the application doesn't respond, it can be forced out (through revokation).

This schedules the CPU for the library OSes, individual sub-process or threads can be scheduled by the OSes in any way they want.

Revokation:

Revoking hardware resource is required to ensure fairness and to notify application of the constraints of the system.
Applications are given revokation signal, and it is their duty to relinquish the resource
E.g. for context switch, the application saves the register it wants and gives back control
Applications aren't killed when they fail to handle revocation, instead abort protocol is used.
Abort protocol:
- The exokernel takes the resource from applications
- If possible, the kernel saves the state. For example, when some physical memory is revoked it might be stored in the hard disk by the exokernel.
- And informs the application about it, so that it can update its mapping.
A small amount of physical memory (5-10 pages) is always guranteed to the application so that core information (handler, page tables) are not revoked.

Aegis (An Exokernel implementation) features:

Allows Synchronous and Asynchronous control transfer from one program to another.
- Control transfer to applications is atomic
- Aegis doesn't overwrite any application-visible register, allowing register set of processors to be used as a temporary message buffer
Exception handling:
- Three registers are saved at predefined locations. These registers are now scratch area for kernel
- Kernel parses and checks the exception and jumps to the application specific exception handler
- The application doesn't have to return back control to kernel, it can proceed with its work
- This works almost 2 orders of magnitude faster than Ultrix (An UNIX OS).

ExOS (An OS built upon Aegis) features:

Using yield syscall, IPC mechanism like pipe, rpc are implemented in user space. The IPC is faster than UNIX, because:
- kernel code is small
- application can manage communication as they want
- server and client process can trust each other and exploit that fact to do efficient communcation