Kmem

Memory Management
Fred Kuhns
WASHINGTON UNIVERSITY IN ST LOUIS
Washington
UNIX Memory Management

UNIX uses a demand paged (anticipatory paging) virtual memory architecture Memory is managed at the page level
page frame == physical page virtual page
Basic memory allocation responsibility of the page-level allocator
Fred Kuhns (11/28/2012)
CS523 Operating Systems
Page-level Allocation
Kernel maintains a list of free physical pages. Since kernel and user space programs use virtual memory address, the physical location of a page is not important Pages are allocated from the free list Two principal clients:
the paging system the kernel memory allocator
Fred Kuhns (11/28/2012)
Memory allocation
physical page
Page-level allocator
Kernel memory Allocator
Paging system
Network buffers
Data structures
temp storage
process
Buffer cache
4
Fred Kuhns (11/28/2012)
Kernel Memory management

Memory organization:
permanently mapped to same memory range of all processes separate address space
up to 1/3 of kernel time may be spent copying data between user and kernel user space can not write/read directly to kernel space KMA uses a preallocated pool of physical pages
Fred Kuhns (11/28/2012)
Kernel Memory Allocation

Typical request is for less than 1 page Originally, kernel used statically allocated, fixed size tables - whats limiting about this? Kernel requires a general purpose allocator for both large and small chunks of memory. handles memory requests from kernel modules, not user level applications
pathname translation routine, STREAMS or I/O
buffers, zombie structures, table table entries (proc structure etc)
Fred Kuhns (11/28/2012)
Requirements of the KMA
Minimize Waste

utilization factor = requested/required memory

Useful metric that factors in fragmentation. 50% considered good
KMA must be fast since extensively used Simple API similar to malloc and free.
Properly aligned allocations: for example 4 byte alignment Support cyclical and bursty usage patterns Interaction with paging system able to borrow pages from paging system if running low
desirable to free portions of allocated space, this is different from typical user space malloc and free interface
Fred Kuhns (11/28/2012)
Example KMAs
Resource Map Allocator Simple Power-of-Two Free Lists The McKusick-Karels Allocator The Buddy System SVR4 Lazy Buddy Allocator Mach-OSF/1 Zone Allocator Solaris Slab Allocator
Fred Kuhns (11/28/2012)
Resource Map Allocator

Resource map is a set of <base,size> pairs that monitor areas of free memory Initially, pool described by a single map entry = <pool_starting_address, pool_size> Allocations result in pool fragmenting with one map Entries sorted in order of increasing base address Requests satisfied using one of three policies:
entry for each contiguous free region
First fit Allocates from first free region with sufficient space. UNIX, fasted, fragmentation is concern Best fit Allocates from smallest that satisfies request. May leave several regions that are too small to be useful Worst fit - Allocates from largest region unless perfect fit is found. Goal is to leave behind larger regions after allocation
9
Fred Kuhns (11/28/2012)
Resource Map Allocator Example

supports operations: offset_t rmalloc(size), void rmfree(base, size)
<0,1024>
after: rmalloc(256), rmalloc(320), rmfree(256,128)

<256,128>
<576,448>
after: rmfree(128,128)
<128,256>
<576,448>
<128,32>
<288,64>
<544,128>
<832,32>
Fred Kuhns (11/28/2012)
10
Resource Map -Good/Bad

Advantages:
simple, easy to implement not restricted to memory allocation, any collection of objects that are sequentially ordered and require allocation and freeing in contiguous chunks. Can allocate exact size within any alignment restrictions. Thus no internal fragmentation. Client may release portion of allocated memory. adjacent free regions are coalesced
Fred Kuhns (11/28/2012)
11
Resource Map -Good/Bad

Disadvantages:
Map may become highly fragmented resulting in low utilization. Poor for performing large requests. Resource map size increases with fragmentation static table will overflow dynamic table needs its own allocator Map must be sort for free region coalescing. Sorting operations are expensive. Requires linear search of map to find free region that matches allocation request. Difficult to return borrowed pages to paging system.
Fred Kuhns (11/28/2012)
12
Resource map final word

Poor performance is why resource maps is considered unsuitable as a general-purpose kernel memory allocator.
<X3,Y>
<X0,Y0> <X1,Y1> <X2,Y> <X4,Y> <X5,Y5> <X6,Y6>
Fred Kuhns (11/28/2012)
13
Simple Power of Twos

has been used to implement malloc() and free() in the user-level C library (libc). Uses a set of free lists with each list storing a particular size of buffer. Buffer sizes are a power of two. Each buffer has a one word header
when free, header stores pointer to next free list element when allocated, header stores pointer to associated free list (where it is returned to when freed). Alternatively, header may contain size of buffer
Fred Kuhns (11/28/2012)
14
Simple Power of Two Free List

Example free list One word header per buffer (pointer) free(buf) must free entire buffer.
freelistarr[] 32 64 128 256 512 1024
Fred Kuhns (11/28/2012) CS523 Operating Systems
15
malloc(X): size = roundup(X + sizeof(header)) roundup(Y) = 2^n, where 2^(n-1) < Y <= 2^n
Simple Power of Twos - Advantages

Simple and reasonably fast eliminates linear searches and fragmentation.
Bounded time for allocations when buffers are available
familiar API simple to share buffers between kernel modules since freeing a buffer does not require knowing its size
Fred Kuhns (11/28/2012)
16
Simple Power of Two - Disadvantages

Rounding requests to power of 2 results in wasted memory and poor utilization.
aggravated by requiring buffer headers since it is not unusual for memory requests to already be a power-of-two.
no provision for coalescing free buffers since buffer sizes are generally fixed. no provision for borrowing pages from paging system although some implementations do this. no provision for returning unused buffers to page allocator
Fred Kuhns (11/28/2012)
17
Simple Power of Two

void *malloc (size) { int ndx = 0; /* free list index */ int bufsize = 1 << MINPOWER /* size of smallest buffer */ size += 4; /* Add for header */ assert (size <= MAXBUFSIZE); while (bufsize < size) { ndx++; bufsize <<= 1; } /* ndx is the index on the freelist array from which a buffer * will be allocated */ }
18
McKusick-Karels Allocator
Improved power of twos implementation Managed Memory must be contiguous pages All buffers within a page must be of equal size Adds page usage array, kmemsizes[], to manage pages Does not require buffer headers to indicate page size. When freeing memory, free(buff) simply masks of the lower order bit to get the page address (actually the page offset = pg) which is used as an index into the kmemsizes array.
19
Used in several UNIX variants Add an allocated page usage array (kmemsizes[]) where a page (pg) is in one of three states:
free: kmemsizes[pg] contains pointer to next free page Divided into buffers: kmemsizes[pg] == buffer size Part of a buffer Spanning multiple pages: kmemsizes[first_pg] = buffer size, where first_page is the first page of the buffer
Fred Kuhns (11/28/2012)
20
Disadvantages:
similar drawbacks to simple power-of-twos allocator vulnerable to bursty usage patterns since no provision for moving buffers between lists eliminates space wastage in common case where allocation request is a power-of-two optimizes round-up computation and eliminates it if size is known at compile time
Advantages:
Fred Kuhns (11/28/2012)
21
McKusick-Karels
freelistarr[] 32 128 64 512 F 32 F ... 32 64 128 256 512 1024
Power of Twos
kmemsizes[], map of managed pages Size of blocks allocated from pages, pointer to free pages
Fred Kuhns (11/28/2012)
No buffer header vulnerable to bursty usage memory not returned to paging system
22
McKusick-Karels ExampleMacros
#define NDX(size) \ (size) > 128 ? (size) > 256 ? 4 : 3 \ : (size) > 64 ? 2 : (size) > 32 ? 1 : 0 #define MALLOC(space, cast, size, flags) \ {\ register struct freelisthdr *flh; \ if (size <= 512 && (flh = freelistarr [NDX(size)]) != NULL) { \ space = (cast)flh->next; \ flh->next = *(caddr_t *)space; \ } else \ space = (cast)malloc (size, flags); \ }
23
Buddy System
Allocation scheme combining Power-of-Two allocator with free buffer coalescing
binary buddy system: simplest and most popular form. Other variants may be used by splitting buffers into four, eight or more pieces.
Approach: create small buffers by repeatedly halving a large buffer (buddy pairs) and coalescing adjacent free buffers when possible. Requests rounded up to a power of two
Fred Kuhns (11/28/2012)
24
Buddy System, Example

Minimum allocation size = 32 Bytes Initial free memory size is 1024 use a bitmap to monitor 32 Byte chunks maintain freelist for each possible buffer size
power of 2 buffer sizes from 32 to 512 sizes = {32, 64, 128, 256, 512} bit set if chunk is used bit clear if chunk is free
Initial one block = entire buffer
Fred Kuhns (11/28/2012)
25
Buddy System in Action

Free list
32 64 128 256 512

256 384 448 512 640 768
0
B
1023 E
D D
1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32 B chunks)

In-use Free
allocate(256), allocate(128), allocate(64), allocate(128), release(C, 128), release (D, 64)
Fred Kuhns 1 (11/28/2012) CS523 Operating Systems
26
Buddy System
Advantages:
good job of coalesces adjacent free buffers easy exchange of memory with paging system
can allocate new page and split as necessary when coalescing results in a complete page, it may be returned to the paging system
Disadvantage:
performance
recursive coalescing is expensive with poor worst case performance back to back allocate and release result in alternate between splitting and coalescing the same memory
poor programming interface:

release needs both buffer and size. entire buffer must be released
Fred Kuhns (11/28/2012)
27
SVR4 Lazy Buddy Algorithm

Addresses the main problem with the buddy system: poor performance due to repetitive coalescing and splitting of buffers. Under steady state conditions, the number of in-use buffers or each size remains relatively constant. Under these condition, coalescing offers no advantage. Coalescing is necessary only to deal with bursty conditions where there are large fluctuation in memory demand.
Fred Kuhns (11/28/2012)
28
SVR4 Lazy Buddy Algorithm

Coalescing delay time taken to either coalesce a single buffer with its buddy or determine its buddy is not free Coalescing is recursive and doesnt stop until a buffer is found which can not be combined with its buddy. Each release operation results in at least one coalescing delay Solution:
defer coalescing until it is necessary results in poor worst-case performance lazy coalescing intermediate solution
Fred Kuhns (11/28/2012)
29
Lazy Coalescing
Release operation has two setps Buffers divided into classes.
Assume N buffers in a given class. N = A + L + G, where
A = number of active buffers, L = number of locally free buffers and G = number of globally free buffers lazy buffer use in steady state, coalescing not necessary. slack >= 2 reclaiming borderline consumption, coalescing needed. slack == 1 acelerated non-steady state consumption, must coalesce faster. slack==0
place buffer on free list making it locally free coalesce with buddy making it globally free
Buffer class states defined by slack = N 2L - G
Fred Kuhns (11/28/2012)
30
Lazy Coalescing
Improvement over basic buddy system steady state all lists in lazy state and no time is wasted splitting and coalescing worst case algorithm limits coalescing to no more than two buffers (two coalescing delays) shown to have an average latency 10% to 32% better than the simple buddy system greater variance and poorer worst case performance for the release routine
31
MACH Zone Allocator

Used in Mach and DEC UNIX fast memory allocation and garbage collection in the background Each class of object (proc structure, message headers etc) assigned its own pool of free objects (aka a zone) set of power-of-two zones for anonymous objects memory allocated from page-allocator page can only hold objects in the same zone struct zone heads free list of objects for each zone zone structs are allocated from a zone of zones
32
MACH Zone Allocator

zone initialization: zinit( size, max, alloc, name) size = object size in bytes max = max size of zone in bytes alloc = amount of memory to add when free list empty name = string describing objects in zone a zone struct is allocated where paramteres are stored active zone structures are kept on an linked list referenced by first_zone and last_zone. First element on list is the zone of zones. allocations are very fast since objexts sizes are fixed.
Fred Kuhns (11/28/2012)
33
MACH Zone Allocator

Garbage Collection happens in the background to free up unused memory wihin zones. frees no-longer used pages expensive, linear search, locks keeps a zone page map refereing to all pages allocated to zones. Each map contains in_free_list = number of objects on free list. Calculated during garbage collection. alloc_count = total number o objects in page
Fred Kuhns (11/28/2012)
34
Zone Garbage Collection

Zone garbage collector: zone_gc() is invoked by swapper each time it runs. It walks through zone list and for each zone it makes two passes through the free first pass increments in_free_list, if in_free_list equals alloc_count then page can be freed. second pass all objects in freed pages are removed from the zones free list. after pass two it calls kmem_free() to free the pages
Fred Kuhns (11/28/2012)
35
MACH Zone Allocator

Advantages
fast and efficient with a simple API
obj = (void *)zalloc (struct zone* z); void zfree (struct zone *z, void *obj);
objects are exact size gc provides for memory reuse
Disadvantages
no provision for releasing part of an object gc efficiency is an issue:
slow and must complete before other system run complex and inefficient algorithm
since zone allocator is used by the memory manager

zone of zones statically allocated
Fred Kuhns (11/28/2012)
36
Mach Zones
zone of zones
first zone last zone
struct zone
struct zone
struct zone
struct zone
Object X zone Object Y zone
struct Obj X
struct Obj Y
struct Obj X
struct Obj Y
struct Obj Y
Fred Kuhns (11/28/2012)
37
Solaris Slab Allocator

Better performance and memory utulization than other approaches studied. Based on Machs Zone Allocator. Slab allocator is concerned with object state while the zone allocator is not Focus on
object reuse, HW cache utilization allocator footprint
Fred Kuhns (11/28/2012)
38
Object Reuse (i.e. Object Caching!).

Allocator must perform 5 operations:
1) allocate memory, 2) initialize object, 3) use object, 4) destruct object, 5) free memory
Slab Allocator
HW Cache Utilization
Footprint portion of the hardware cache and TLB that is overwritten by the allocation itself.
large footprint can displace data in cache and TLB small footprint desirable
Due to typical buffer address distributions cache contention may occur resulting in its poor utilization slab allocator addresses this
Fred Kuhns (11/28/2012)
39
Design of Slab Allocator

cachep = kmem_cache_create (name, size, align, ctor, dtor); page-level allocator back end vnode cache proc cache mbuf cache msgb cache
front end
vnode vnode vnode vnode
proc proc proc
mbuf mbuf
msgb msgb msgb msgb msgb
Objects in use by the kernel

40
Slab Allocator Design

Organized as a collection of caches mechanisms for adding or removing memory from a cache Create object cache with object ctor and dtor. objp = kmem_cache_alloc (cachep, flags); kmem_cache_free (cachep, objp);
Cache grows by allocating a slab, contiguous pages managed as a unit. kmem_cache_reap (cachep) - garbage collection.
41
Before releasing object kernel must initialize object before it is released.
Slab Organization
Coloring area free active Unused space free active active free slab struct Slab linked list
free list Free list pointers
32 Byte kmem_slab
Coloring area - vary starting offsets, optimize HW cache and bus
Fred Kuhns (11/28/2012)
42
Slab Allocator Implementation

Slab = address % slab size Cache stores slabs in a partly sorted list: fully active, partially active, free slabs.
Why?
For large objects (>page size)

management structures on separate memory pool
Fred Kuhns (11/28/2012)
43
Summary
Limited space overhead (kmem_slab, free ptr) service requests quickly (remove a preinitialized object) coloring scheme for better HW utilization small footprint simple garbage collection (cost shared across multiple requests - in-use count) Increased management overhead compared to simpler methods (power of twos)
44

Kmem

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kmem

Uploaded by

Copyright:

Available Formats

Memory Management

WASHINGTON UNIVERSITY IN ST LOUIS

UNIX Memory Management

Basic memory allocation responsibility of the page-level allocator

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Kernel memory Allocator

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Kernel Memory management

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Kernel Memory Allocation

buffers, zombie structures, table table entries (proc structure etc)

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Requirements of the KMA

utilization factor = requested/required memory

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Resource Map Allocator

entry for each contiguous free region

Fred Kuhns (11/28/2012)

Resource Map Allocator Example

after: rmalloc(256), rmalloc(320), rmfree(256,128)

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Resource Map -Good/Bad

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Resource Map -Good/Bad

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Resource map final word

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Simple Power of Twos

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Simple Power of Two Free List

Simple Power of Twos - Advantages

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Simple Power of Two - Disadvantages

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Simple Power of Two

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Fred Kuhns (11/28/2012)

CS523 Operating Systems

CS523 Operating Systems

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Buddy System, Example

Initial one block = entire buffer

Fred Kuhns (11/28/2012)

CS523 Operating Systems