Kernel Samepage Merging (KSM) in Linux Operating System.

February 16, 2010

Kernel SamePage Merging is a recent linux kernel feature which combines identical memory pages from multiple processes into one copy on write memory region. Because kvm guest virtual machines run as processes under linux, this feature provides the memory overcommit feature to kvm so important to hypervisors for more efficient use of memory. The motivation for KSM comes from the observation that on a large machine running several virtual machines (potentially running the same OS), there are number of duplicate pages , each consuming a page of memory in the hypervisor. If pages with exactly the same content don’t change often , then sharing these pages across the various users will free up the memory for other use in the system. The ability of the hypervisor to provide more memory than is really available (via virtual memory or other means) is memory overcommit. The scope of sharing text regions (compiled code and instructions) is enormous. In a traditional operating system , text sharing happens automatically.

An overview of the design.

KSM is currently designed only to work in anonymous page. This works very well for a KVM based virtualisation environtment , since all of the memory used by the virtual machine / OS is mapped into the qemu process. For each anonymous pages scanned , the algorithm looks for a match in the stable tree. If a match is found , the page is merged with the page in the stable tree.

Kernel thread: KSM creates a kernel thread to help with merging pages with similar content in the background. This might change in the future  , and KSM might create several threads depending on the topology of the system and the total available memory.

Stability: KSM maintains two data structures – the stable and unstable tree. KSM works in multiple passes , checking to see if pages with similar content can be merged. At the end of  each pass , the unstable tree is reinitialised. A pass is a full scan of areas registered via the madvise(2) API.

User hints: The user provides hints on the address that can be considered as ‘mergeable’ using the madvise(2) system call. The KSM subsystem uses these hints to decide what virtual address space to scan for pages with duplicate content.

Conclusion , the primary user of KSM is expected to be virtualisation solutions and KVM in particular. However , KSM is designed to be generic. In short , KSM is suited for application beyond virtualisation , specifically if there is a chance of a large amount of content being the same.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: