Professional Documents
Culture Documents
07 Apr 2010
Linux as a hypervisor includes a number of innovations, and one of the more
interesting changes in the 2.6.32 kernel is Kernel Shared Memory (KSM). KSM
allows the hypervisor to increase the number of concurrent virtual machines by
consolidating identical memory pages. Explore the ideas behind KSM (such as
storage de-duplication), its implementation, and how you manage it.
Connect with Tim
Tim is one of our most popular and prolific authors. Browse all of
Tim's articles on developerWorks. Check out Tim's profile and
connect with him, other authors, and fellow readers in My
developerWorks.
Server virtualization
Extreme server consolidation
Although enterprise servers can consolidate 10 or more servers
using virtualization, a single IBM System z server can support for
thousands of Linux guests on a single logical partition.
Trademarks
Page 1 of 10
developerWorks
ibm.com/developerWorks
Virtualization as a technology has been around since the 1960s, popularized by the
IBM System/360 mainframe. Five decades later, virtualization has exploded,
making it possible to share a single server with multiple operating systems and
applications. This particular use, called server virtualization, is transforming data
centers, as a single physical machine can be used to host typically 10 or more virtual
machines (VMs), as shown in Figure 1. This virtualization enables the infrastructure
to be much more dynamic, power efficient, and (therefore) cost-efficient. See
Resources for more information on server virtualization and its benefits.
Figure 1. Server consolidation through virtualization
With only 10-15% of a typical server in use at any given time, consolidating virtual
servers onto a single physical server makes perfect sense. But what about
resources? The major resources available include CPU, memory, and networking
bandwidth. CPU and networking are typically underutilized, so the real issue
becomes available memory. Each operating system makes its own unique demands
on available memory resources. But the question that arises is, how unique are
those demands?
Memory sharing
It turns out that if you virtualize many of the same operating system and application
sets, many memory pages are identical. This makes sense, given that operating
system and application code and constant data are identical among VMs. And when
pages are unique, they can be consolidated, which frees memory for use in other
applications. Figure 2 illustrates memory sharing and shows the benefit of more free
memory available when pages are shared among VMs whose contents are the
same.
Figure 2. Memory sharing across VMs
Trademarks
Page 2 of 10
ibm.com/developerWorks
developerWorks
Feature naming
The feature described here is very new; therefore, the name has
undergone some changes. You'll find this Linux kernel feature
called Kernel Shared Memory as well as Kernel Samepage
Merging.
What you'll soon discover is that although memory sharing in Linux is advantageous
in virtualized environments (KSM was originally designed for use with the
Kernel-based Virtual Machine [KVM]), it's also useful in non-virtualized
environments. In fact, KSM was found to be beneficial even in embedded Linux
systems, indicating the flexibility of the approach. Now, let's explore the Linux
approach to memory sharing and how you can use it to increase the memory density
of a server and therefore increase its ability to host additional applications or VMs.
Trademarks
Page 3 of 10
developerWorks
ibm.com/developerWorks
KSM operation
KSM exists as a daemon in the kernel (called ksmd) that periodically performs page
scans to identify duplicate pages and merges duplicates to free pages for other
uses. It does this in a way that's transparent to the user. For example, duplicate
pages are merged (and subsequently marked read only), but if one of the users of
the page changes it for any reason, that user will receive his or her own copy (in a
CoW fashion). You can find the full implementation of the KSM kernel module within
the kernel source at ./mm/ksm.c.
KSM relies on a higher-level application to provide instruction on which memory
regions are candidates for merging. Although it would be possible for KSM to simply
scan all anonymous pages in the system, it would be wasteful of CPU and memory
(given the space necessary to manage the page-merging process). Therefore,
applications can register the virtual areas that are likely to contain duplicate pages.
The KSM application programming interface (API) is implemented through the
madvise system call (see Listing 1) and a new advice parameter called
MADV_MERGEABLE (which indicates that the defined region is mergeable). A region
can be removed from the mergeable state through the MADV_UNMERGEABLE
parameter (which immediately un-merges any merged pages from that region). Note
that removing a region of pages through madvise can result in an EAGAIN error, as
the operation could run out of memory during the un-merge process, potentially
leading to even greater trouble (out-of-memory conditions).
Listing 1. The madvise system call
#include <sys/mman.h>
int madvise( void *start, size_t length, int advice );
Once a region is defined as mergeable, KSM adds this region to its list of working
memory. When KSM is enabled, it searches for identical pages, keeping one page in
a write-protected CoW fashion and freeing one for other uses.
KSM uses a approach that's different from the approach used in storage
de-duplication. In traditional de-duplication, objects are hashed, and the hash value
Trademarks
Page 4 of 10
ibm.com/developerWorks
developerWorks
is used as the initial check for similarity. When hashes are identical, the next step is
a comparison of the actual objects (in this case, a memory comparison) to formally
determine whether the objects are identical. KSM used this approach in its first
implementation, but a more straightforward approach was developed to simplify it.
In the current KSM, pages are managed by two red-black trees, one of which is
ephemeral. The first tree, call the unstable tree, is used to store new pages that are
not yet understood to be stable. In other words, pages that are candidates for
merging (unchanged for some period of time) are stored in the unstable tree. Pages
in the unstable tree are not write protected. The second tree, called the stable tree,
stores those pages that have been found to be stable and merged by KSM. To
identify whether a page is volatile or non-volatile, KSM uses a simple 32-bit
checksum. When a page is scanned, its checksum is calculated and stored with the
page. On a subsequent scan, if a newly computed checksum is different from the
previously generated checksum, the page is changing and is therefore not a good
candidate for merging.
The first step in handling a single page using KSM's process is to see whether the
page can be found in the stable tree. The process of searching the tree is interesting
in that each page is treated as a very large number (the contents of the page). A
memcmp (memory compare) operation is performed on the page and node's page of
consideration. If the memcmp returns 0, the pages are equal and a match is found.
Otherwise, memcmp can return -1 (which means that the candidate page is less than
the current node's page) or 1 (the candidate page is greater than the current node's
page). In each case, the operation leads the search down the red-black tree (to the
left if less than and to the right if greater than). Although comparing 4KB pages
seems fairly heavyweight, in most cases, the memcmp will end prematurely once a
difference is found. Therefore, the process itself is fast and efficient. See Figure 3 for
a visualization of this process.
Figure 3. Search process for a page in the tree
Trademarks
Page 5 of 10
developerWorks
ibm.com/developerWorks
If the candidate page is found in the stable tree, then it is merged and the candidate
page freed. This code can be found in ksm.c/stable_tree_search() (which is
called by ksm.c/cmp_and_merge_page()). Otherwise, if the candidate page is not
found, you move on to the unstable tree (see ksm.c/unstable_tree_search()).
The first step in the unstable tree search is to recalculate the checksum over the
page. If it's found to be different than the original checksum, the page is discarded
from further search in this scan (because if it changes, it's not worthwhile to track). If
the checksum has not changed, the unstable tree is searched for the candidate
page. The unstable tree processing is a bit different than the stable tree. First, if the
search does not find the page in the unstable tree, you add a new node in the
unstable tree for this page. But if the page is found in the unstable tree, you merge
the pages, and then migrate the node to the stable tree.
When the scan is complete (performed through ksm.c/ksm_do_scan()), the stable
tree is kept, but the unstable tree is removed and rebuilt at the time of the next scan.
This process simplifies things quite a bit, as the organization of the unstable tree can
change based upon changes to the pages (recall that pages in the unstable tree are
not write protected). Because all pages in the stable tree are write protected, a page
fault is generated when a page tries to be written, allowing the CoW process to
un-merge the page for the writer (see ksm.c/break_cow()). Orphaned pages in the
stable tree are subsequently removed (unless two or more users of the page exist,
indicating that the page is still shared).
Trademarks
Page 6 of 10
ibm.com/developerWorks
developerWorks
As discussed, KSM uses red-black trees for managing pages for fast lookup. Linux
actually includes red-black trees as a reusable data structure, and they can be used
generically. Red-black trees are also used by the Completely Fair Scheduler (CFS)
for storage of tasks in time order. You can find the implementation of red-black trees
in the kernel at ./lib/rbtree.c.
Trademarks
Page 7 of 10
developerWorks
ibm.com/developerWorks
Going further
Linux is not alone in using page sharing to improve memory efficiency, but it is
unique in its implementation as an operating system feature. VMware's ESX server
hypervisor provides this feature under the name Transparent Page Sharing (TPS),
while XEN calls it Memory CoW. But whatever the name or implementation, the
feature provides better memory utilization, allowing the operating system (or
hypervisor, in the case of KVM) to over-commit memory to support greater numbers
of applications or VMs. You can find KSMand many other interesting featuresin
the latest 2.6.32 Linux kernel.
Trademarks
Page 8 of 10
ibm.com/developerWorks
developerWorks
Resources
Learn
The authors of KSM at Red Hat wrote a great paper on KSM entitled
"Increasing memory density by using KSM" (PDF). This paper introduces KSM
and delves into the details of its implementation.
Virtualization is growing dramatically as a way to consolidate for power and cost
savings but also as a key element of cloud computing architectures. Learn
about virtualization and its role in green IT in "Growing green with virtualization:
Virtualization as a backbone of green IT" (developerWorks, August 2009).
This article demonstrates an advantage of using Linux as a hypervisor. You can
learn more about this and other topics in "Anatomy of a Linux Hypervisor"
(developerWorks, May 2009), "Discover the Linux Kernel Virtual Machine"
(developerWorks, April 2007), and "Virtual Linux" (developerWorks, December
2006). Finally, read about another kernel subsystem that uses red-black trees in
"Inside the 2.6 Completely Fair Scheduler" (developerWorks, December 2009).
You can read more about memory de-duplication from the 2.6.32 kernel release
notes. These notes provide useful materials on KSM (including a link to the
kernel documentation) in addition to the various other changes that made their
way into 2.6.32. In the KSM section, you'll find an interesting reference to
running 52 Windows XP VMs on a single server with 16GB of RAM (where
each was allocated its own 1GB RAM segment but shared with KSM).
Learn more about de-duplication of storage systems and the approaches in the
IT Analysis article, "De-dupe for big storage savings?".
Wikipedia provides a great introduction to de-duplication, including the various
implementation strategies (such as source and target).
Although KSM is an automatic mechanism for reducing memory footprint in
virtualized servers, application developers have had a similar manual capability
for some time. Linux dynamic (or shared) libraries permit a binary to use a
common library object instead of one statically compiled with the application.
Read more about shared libraries in "Anatomy of Linux dynamic libraries"
(developerWorks, August 2008).
A red-black tree (or symmetric binary B-tree) is a self-balancing binary tree
invented by Rudolf Bayer. It's a useful tree representation that has good
worst-case time for operations such as insert, search, and delete. You can find
red-black trees used in a variety of applications, including the construction of
associative arrays.
In the developerWorks Linux zone, find hundreds of how-to articles and
tutorials, as well as downloads, discussion forums, and a wealth other
resources for Linux developers and administrators.
Anatomy of Linux Kernel Shared Memory
Copyright IBM Corporation 2010. All rights reserved.
Trademarks
Page 9 of 10
developerWorks
ibm.com/developerWorks
Trademarks
Page 10 of 10