A. Private Process Management The first requirement for implementing PRIVEXEC is to enable the OS to support a private execution mode for processes. The OS must be able to launch an application as a private process upon request from the user, generate the PEK, store it in an easily accessible context associated with that process, mark the process and track it during its lifetime, and, finally, destroy the PEK when the private process ter- minates. Additionally, these new capabilities must not break the established kernel process management functionality. At the same time, the OS must expose a simple interface for user-level applications to request private execution without requiring modifications to existing application code. The Linux kernel represents every process on the system using a process descriptor, defined as struct task_struct in include/linux/sched.h. The process descriptor contains all the information required to execute the process, including functions such as scheduling, virtual address space management, and accounting. A new process, or child, is created by copying an existing process, or parent, through the fork and clone system calls. clone is a Linux-specific system call that offers fine-grained control over which system resources the parent and child share through a set of clone flags passed as an argument, and is typically used for creating threads. fork, on the other hand, defines a static set of clone flags to create independent pro- cesses with the usual POSIX semantics. These two system calls, in turn, invoke the function do_fork implemented in kernel/fork.c, which allocates a new process descriptor for the child, initializes it, and prepares it for scheduling. When the process is terminated, for example by invoking the exit system call, the function do_exit, implemented in kernel/exit.c, deallocates resources associated with the process. To implement our system, we first extended the process descriptor by defining a new process flag, PF_PRIVEXEC,
that is set in the flags field of the process descriptor to indicate that it is a private process. We defined a new flag, CLONE_PRIVEXEC, that is passed to clone whenever a privateprocessistobecreated.Weintroducedafieldtostore the PEK in the process descriptor called privexec_key. The final addition to the process descriptor was a pre- allocated cryptographic transform struct that is used for swap encryption. Here, we relied upon the Linux kernel’s cryptography framework (Crypto API); we defer details of its use to Section IV-C. To handle private process creation, we modified do_fork to check for the presence of CLONE_PRIVEXEC. In that case, we set the PF_PRIVEXEC flag, and generate a fresh PEK using a cryptographically-secure PRNG. The PEK is stored inside the process descriptor, resides in the kernel virtual address space, and is never disclosed to the user. For private process termination, we adapted do_exit to check for the presence of PF_PRIVEXEC in the flags bitset. If present, the process cryptographic transform is deallocated, and the PEK is securely wiped prior to freeing the process descriptor. Since the Linux kernel handles both processes and threads in the same functions, this approach also allows for creating and terminating private threads without any additional implementation effort. Note that applications might spawn additional children for creating subprocesses or threads during the course of execution. This can lead to two critical issues with multi-process and multi-threaded applications running under PRIVEXEC. First, public children of a private process could cause privacy leaks. Second, public children cannot access the parent’s secure container, which could potentially break the application. In order to prevent these problems, our notion of a private execution should include the full set of application processes and threads, despite the fact that the Linux kernel represents them with separate process descriptors. Therefore, we modified do_fork to ensure that all children of a private process inherit the parent’s private status and privacy context, including both the PEK and the secure storage container. Reference counting is used to ensure that resources are properly disposed of when the entire private process group exits. Also, note that our implementation exposes PRIVEXEC to user applications through a new clone flag that is passed to clone. As a result, when the private execution flag is not passed to the system call, the original semantics of fork and clone are preserved, maintaining full compatibility with existing applications. Likewise, applications that are not aware of the newly implemented PRIVEXEC interface to clone could be made private by simply wrapping their executables with a program that spawns them using the private execution flag. We explain how existing applications run under PRIVEXEC without modifications in Section IV-E.