You are on page 1of 44

Memory Management

Fred Kuhns

WASHINGTON UNIVERSITY IN ST LOUIS

Washington

UNIX Memory Management


UNIX uses a demand paged (anticipatory paging) virtual memory architecture Memory is managed at the page level
page frame == physical page virtual page

Basic memory allocation responsibility of the page-level allocator

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Page-level Allocation
Kernel maintains a list of free physical pages. Since kernel and user space programs use virtual memory address, the physical location of a page is not important Pages are allocated from the free list Two principal clients:
the paging system the kernel memory allocator

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Memory allocation
physical page

Page-level allocator

Kernel memory Allocator

Paging system

Network buffers

Data structures

temp storage

process

Buffer cache
4

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Kernel Memory management


Memory organization:
permanently mapped to same memory range of all processes separate address space

up to 1/3 of kernel time may be spent copying data between user and kernel user space can not write/read directly to kernel space KMA uses a preallocated pool of physical pages

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Kernel Memory Allocation


Typical request is for less than 1 page Originally, kernel used statically allocated, fixed size tables - whats limiting about this? Kernel requires a general purpose allocator for both large and small chunks of memory. handles memory requests from kernel modules, not user level applications
pathname translation routine, STREAMS or I/O

buffers, zombie structures, table table entries (proc structure etc)

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Requirements of the KMA

Minimize Waste

utilization factor = requested/required memory


Useful metric that factors in fragmentation. 50% considered good

KMA must be fast since extensively used Simple API similar to malloc and free.

Properly aligned allocations: for example 4 byte alignment Support cyclical and bursty usage patterns Interaction with paging system able to borrow pages from paging system if running low

desirable to free portions of allocated space, this is different from typical user space malloc and free interface

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Example KMAs
Resource Map Allocator Simple Power-of-Two Free Lists The McKusick-Karels Allocator The Buddy System SVR4 Lazy Buddy Allocator Mach-OSF/1 Zone Allocator Solaris Slab Allocator

Fred Kuhns (11/28/2012)

CS523 Operating Systems

Resource Map Allocator


Resource map is a set of <base,size> pairs that monitor areas of free memory Initially, pool described by a single map entry = <pool_starting_address, pool_size> Allocations result in pool fragmenting with one map Entries sorted in order of increasing base address Requests satisfied using one of three policies:

entry for each contiguous free region

First fit Allocates from first free region with sufficient space. UNIX, fasted, fragmentation is concern Best fit Allocates from smallest that satisfies request. May leave several regions that are too small to be useful Worst fit - Allocates from largest region unless perfect fit is found. Goal is to leave behind larger regions after allocation
CS523 Operating Systems
9

Fred Kuhns (11/28/2012)

Resource Map Allocator Example


supports operations: offset_t rmalloc(size), void rmfree(base, size)
<0,1024>

after: rmalloc(256), rmalloc(320), rmfree(256,128)


<256,128>
<576,448>

after: rmfree(128,128)
<128,256>
<576,448>

<128,32>

<288,64>

<544,128>

<832,32>

Fred Kuhns (11/28/2012)

CS523 Operating Systems

10

Resource Map -Good/Bad


Advantages:
simple, easy to implement not restricted to memory allocation, any collection of objects that are sequentially ordered and require allocation and freeing in contiguous chunks. Can allocate exact size within any alignment restrictions. Thus no internal fragmentation. Client may release portion of allocated memory. adjacent free regions are coalesced

Fred Kuhns (11/28/2012)

CS523 Operating Systems

11

Resource Map -Good/Bad


Disadvantages:
Map may become highly fragmented resulting in low utilization. Poor for performing large requests. Resource map size increases with fragmentation static table will overflow dynamic table needs its own allocator Map must be sort for free region coalescing. Sorting operations are expensive. Requires linear search of map to find free region that matches allocation request. Difficult to return borrowed pages to paging system.

Fred Kuhns (11/28/2012)

CS523 Operating Systems

12

Resource map final word


Poor performance is why resource maps is considered unsuitable as a general-purpose kernel memory allocator.

<X3,Y>
<X0,Y0> <X1,Y1> <X2,Y> <X4,Y> <X5,Y5> <X6,Y6>

Fred Kuhns (11/28/2012)

CS523 Operating Systems

13

Simple Power of Twos


has been used to implement malloc() and free() in the user-level C library (libc). Uses a set of free lists with each list storing a particular size of buffer. Buffer sizes are a power of two. Each buffer has a one word header
when free, header stores pointer to next free list element when allocated, header stores pointer to associated free list (where it is returned to when freed). Alternatively, header may contain size of buffer

Fred Kuhns (11/28/2012)

CS523 Operating Systems

14

Simple Power of Two Free List


Example free list One word header per buffer (pointer) free(buf) must free entire buffer.
freelistarr[] 32 64 128 256 512 1024
Fred Kuhns (11/28/2012) CS523 Operating Systems
15

malloc(X): size = roundup(X + sizeof(header)) roundup(Y) = 2^n, where 2^(n-1) < Y <= 2^n

Simple Power of Twos - Advantages


Simple and reasonably fast eliminates linear searches and fragmentation.
Bounded time for allocations when buffers are available

familiar API simple to share buffers between kernel modules since freeing a buffer does not require knowing its size

Fred Kuhns (11/28/2012)

CS523 Operating Systems

16

Simple Power of Two - Disadvantages


Rounding requests to power of 2 results in wasted memory and poor utilization.
aggravated by requiring buffer headers since it is not unusual for memory requests to already be a power-of-two.

no provision for coalescing free buffers since buffer sizes are generally fixed. no provision for borrowing pages from paging system although some implementations do this. no provision for returning unused buffers to page allocator

Fred Kuhns (11/28/2012)

CS523 Operating Systems

17

Simple Power of Two


void *malloc (size) { int ndx = 0; /* free list index */ int bufsize = 1 << MINPOWER /* size of smallest buffer */ size += 4; /* Add for header */ assert (size <= MAXBUFSIZE); while (bufsize < size) { ndx++; bufsize <<= 1; } /* ndx is the index on the freelist array from which a buffer * will be allocated */ }
Fred Kuhns (11/28/2012) CS523 Operating Systems
18

McKusick-Karels Allocator
Improved power of twos implementation Managed Memory must be contiguous pages All buffers within a page must be of equal size Adds page usage array, kmemsizes[], to manage pages Does not require buffer headers to indicate page size. When freeing memory, free(buff) simply masks of the lower order bit to get the page address (actually the page offset = pg) which is used as an index into the kmemsizes array.
Fred Kuhns (11/28/2012) CS523 Operating Systems
19

McKusick-Karels Allocator
Used in several UNIX variants Add an allocated page usage array (kmemsizes[]) where a page (pg) is in one of three states:

free: kmemsizes[pg] contains pointer to next free page Divided into buffers: kmemsizes[pg] == buffer size Part of a buffer Spanning multiple pages: kmemsizes[first_pg] = buffer size, where first_page is the first page of the buffer

Fred Kuhns (11/28/2012)

CS523 Operating Systems

20

McKusick-Karels Allocator
Disadvantages:
similar drawbacks to simple power-of-twos allocator vulnerable to bursty usage patterns since no provision for moving buffers between lists eliminates space wastage in common case where allocation request is a power-of-two optimizes round-up computation and eliminates it if size is known at compile time

Advantages:

Fred Kuhns (11/28/2012)

CS523 Operating Systems

21

McKusick-Karels
freelistarr[] 32 128 64 512 F 32 F ... 32 64 128 256 512 1024

Power of Twos

kmemsizes[], map of managed pages Size of blocks allocated from pages, pointer to free pages
Fred Kuhns (11/28/2012)

No buffer header vulnerable to bursty usage memory not returned to paging system
22

CS523 Operating Systems

McKusick-Karels ExampleMacros
#define NDX(size) \ (size) > 128 ? (size) > 256 ? 4 : 3 \ : (size) > 64 ? 2 : (size) > 32 ? 1 : 0 #define MALLOC(space, cast, size, flags) \ {\ register struct freelisthdr *flh; \ if (size <= 512 && (flh = freelistarr [NDX(size)]) != NULL) { \ space = (cast)flh->next; \ flh->next = *(caddr_t *)space; \ } else \ space = (cast)malloc (size, flags); \ }
Fred Kuhns (11/28/2012) CS523 Operating Systems
23

Buddy System
Allocation scheme combining Power-of-Two allocator with free buffer coalescing
binary buddy system: simplest and most popular form. Other variants may be used by splitting buffers into four, eight or more pieces.

Approach: create small buffers by repeatedly halving a large buffer (buddy pairs) and coalescing adjacent free buffers when possible. Requests rounded up to a power of two

Fred Kuhns (11/28/2012)

CS523 Operating Systems

24

Buddy System, Example


Minimum allocation size = 32 Bytes Initial free memory size is 1024 use a bitmap to monitor 32 Byte chunks maintain freelist for each possible buffer size
power of 2 buffer sizes from 32 to 512 sizes = {32, 64, 128, 256, 512} bit set if chunk is used bit clear if chunk is free

Initial one block = entire buffer

Fred Kuhns (11/28/2012)

CS523 Operating Systems

25

Buddy System in Action


Free list

32 64 128 256 512


256 384 448 512 640 768

0
B

1023 E

D D

1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

Bitmap (32 B chunks)


In-use Free
allocate(256), allocate(128), allocate(64), allocate(128), release(C, 128), release (D, 64)
Fred Kuhns 1 (11/28/2012) CS523 Operating Systems
26

Buddy System
Advantages:
good job of coalesces adjacent free buffers easy exchange of memory with paging system
can allocate new page and split as necessary when coalescing results in a complete page, it may be returned to the paging system

Disadvantage:
performance
recursive coalescing is expensive with poor worst case performance back to back allocate and release result in alternate between splitting and coalescing the same memory

poor programming interface:


release needs both buffer and size. entire buffer must be released

Fred Kuhns (11/28/2012)

CS523 Operating Systems

27

SVR4 Lazy Buddy Algorithm


Addresses the main problem with the buddy system: poor performance due to repetitive coalescing and splitting of buffers. Under steady state conditions, the number of in-use buffers or each size remains relatively constant. Under these condition, coalescing offers no advantage. Coalescing is necessary only to deal with bursty conditions where there are large fluctuation in memory demand.

Fred Kuhns (11/28/2012)

CS523 Operating Systems

28

SVR4 Lazy Buddy Algorithm


Coalescing delay time taken to either coalesce a single buffer with its buddy or determine its buddy is not free Coalescing is recursive and doesnt stop until a buffer is found which can not be combined with its buddy. Each release operation results in at least one coalescing delay Solution:
defer coalescing until it is necessary results in poor worst-case performance lazy coalescing intermediate solution

Fred Kuhns (11/28/2012)

CS523 Operating Systems

29

Lazy Coalescing
Release operation has two setps Buffers divided into classes.
Assume N buffers in a given class. N = A + L + G, where
A = number of active buffers, L = number of locally free buffers and G = number of globally free buffers lazy buffer use in steady state, coalescing not necessary. slack >= 2 reclaiming borderline consumption, coalescing needed. slack == 1 acelerated non-steady state consumption, must coalesce faster. slack==0

place buffer on free list making it locally free coalesce with buddy making it globally free

Buffer class states defined by slack = N 2L - G

Fred Kuhns (11/28/2012)

CS523 Operating Systems

30

Lazy Coalescing
Improvement over basic buddy system steady state all lists in lazy state and no time is wasted splitting and coalescing worst case algorithm limits coalescing to no more than two buffers (two coalescing delays) shown to have an average latency 10% to 32% better than the simple buddy system greater variance and poorer worst case performance for the release routine
Fred Kuhns (11/28/2012) CS523 Operating Systems
31

MACH Zone Allocator


Used in Mach and DEC UNIX fast memory allocation and garbage collection in the background Each class of object (proc structure, message headers etc) assigned its own pool of free objects (aka a zone) set of power-of-two zones for anonymous objects memory allocated from page-allocator page can only hold objects in the same zone struct zone heads free list of objects for each zone zone structs are allocated from a zone of zones
Fred Kuhns (11/28/2012) CS523 Operating Systems
32

MACH Zone Allocator


zone initialization: zinit( size, max, alloc, name) size = object size in bytes max = max size of zone in bytes alloc = amount of memory to add when free list empty name = string describing objects in zone a zone struct is allocated where paramteres are stored active zone structures are kept on an linked list referenced by first_zone and last_zone. First element on list is the zone of zones. allocations are very fast since objexts sizes are fixed.

Fred Kuhns (11/28/2012)

CS523 Operating Systems

33

MACH Zone Allocator


Garbage Collection happens in the background to free up unused memory wihin zones. frees no-longer used pages expensive, linear search, locks keeps a zone page map refereing to all pages allocated to zones. Each map contains in_free_list = number of objects on free list. Calculated during garbage collection. alloc_count = total number o objects in page

Fred Kuhns (11/28/2012)

CS523 Operating Systems

34

Zone Garbage Collection


Zone garbage collector: zone_gc() is invoked by swapper each time it runs. It walks through zone list and for each zone it makes two passes through the free first pass increments in_free_list, if in_free_list equals alloc_count then page can be freed. second pass all objects in freed pages are removed from the zones free list. after pass two it calls kmem_free() to free the pages

Fred Kuhns (11/28/2012)

CS523 Operating Systems

35

MACH Zone Allocator


Advantages
fast and efficient with a simple API
obj = (void *)zalloc (struct zone* z); void zfree (struct zone *z, void *obj);

objects are exact size gc provides for memory reuse

Disadvantages
no provision for releasing part of an object gc efficiency is an issue:
slow and must complete before other system run complex and inefficient algorithm

since zone allocator is used by the memory manager


zone of zones statically allocated

Fred Kuhns (11/28/2012)

CS523 Operating Systems

36

Mach Zones

zone of zones
first zone last zone

struct zone

struct zone

struct zone

struct zone

Object X zone Object Y zone

struct Obj X
struct Obj Y

struct Obj X

struct Obj Y

struct Obj Y

Fred Kuhns (11/28/2012)

CS523 Operating Systems

37

Solaris Slab Allocator


Better performance and memory utulization than other approaches studied. Based on Machs Zone Allocator. Slab allocator is concerned with object state while the zone allocator is not Focus on
object reuse, HW cache utilization allocator footprint

Fred Kuhns (11/28/2012)

CS523 Operating Systems

38

Object Reuse (i.e. Object Caching!).


Allocator must perform 5 operations:
1) allocate memory, 2) initialize object, 3) use object, 4) destruct object, 5) free memory

Slab Allocator

HW Cache Utilization

Footprint portion of the hardware cache and TLB that is overwritten by the allocation itself.
large footprint can displace data in cache and TLB small footprint desirable

Due to typical buffer address distributions cache contention may occur resulting in its poor utilization slab allocator addresses this

Fred Kuhns (11/28/2012)

CS523 Operating Systems

39

Design of Slab Allocator


cachep = kmem_cache_create (name, size, align, ctor, dtor); page-level allocator back end vnode cache proc cache mbuf cache msgb cache

front end

vnode vnode vnode vnode

proc proc proc

mbuf mbuf

msgb msgb msgb msgb msgb

Objects in use by the kernel


Fred Kuhns (11/28/2012) CS523 Operating Systems
40

Slab Allocator Design


Organized as a collection of caches mechanisms for adding or removing memory from a cache Create object cache with object ctor and dtor. objp = kmem_cache_alloc (cachep, flags); kmem_cache_free (cachep, objp);
Cache grows by allocating a slab, contiguous pages managed as a unit. kmem_cache_reap (cachep) - garbage collection.
Fred Kuhns (11/28/2012) CS523 Operating Systems
41

Before releasing object kernel must initialize object before it is released.

Slab Organization
Coloring area free active Unused space free active active free slab struct Slab linked list

free list Free list pointers

32 Byte kmem_slab

Coloring area - vary starting offsets, optimize HW cache and bus

Fred Kuhns (11/28/2012)

CS523 Operating Systems

42

Slab Allocator Implementation


Slab = address % slab size Cache stores slabs in a partly sorted list: fully active, partially active, free slabs.
Why?

For large objects (>page size)


management structures on separate memory pool

Fred Kuhns (11/28/2012)

CS523 Operating Systems

43

Summary
Limited space overhead (kmem_slab, free ptr) service requests quickly (remove a preinitialized object) coloring scheme for better HW utilization small footprint simple garbage collection (cost shared across multiple requests - in-use count) Increased management overhead compared to simpler methods (power of twos)
Fred Kuhns (11/28/2012) CS523 Operating Systems
44

You might also like