Professional Documents
Culture Documents
Fred Kuhns
Washington
Page-level Allocation
Kernel maintains a list of free physical pages. Since kernel and user space programs use virtual memory address, the physical location of a page is not important Pages are allocated from the free list Two principal clients:
the paging system the kernel memory allocator
Memory allocation
physical page
Page-level allocator
Paging system
Network buffers
Data structures
temp storage
process
Buffer cache
4
up to 1/3 of kernel time may be spent copying data between user and kernel user space can not write/read directly to kernel space KMA uses a preallocated pool of physical pages
Minimize Waste
KMA must be fast since extensively used Simple API similar to malloc and free.
Properly aligned allocations: for example 4 byte alignment Support cyclical and bursty usage patterns Interaction with paging system able to borrow pages from paging system if running low
desirable to free portions of allocated space, this is different from typical user space malloc and free interface
Example KMAs
Resource Map Allocator Simple Power-of-Two Free Lists The McKusick-Karels Allocator The Buddy System SVR4 Lazy Buddy Allocator Mach-OSF/1 Zone Allocator Solaris Slab Allocator
First fit Allocates from first free region with sufficient space. UNIX, fasted, fragmentation is concern Best fit Allocates from smallest that satisfies request. May leave several regions that are too small to be useful Worst fit - Allocates from largest region unless perfect fit is found. Goal is to leave behind larger regions after allocation
CS523 Operating Systems
9
after: rmfree(128,128)
<128,256>
<576,448>
<128,32>
<288,64>
<544,128>
<832,32>
10
11
12
<X3,Y>
<X0,Y0> <X1,Y1> <X2,Y> <X4,Y> <X5,Y5> <X6,Y6>
13
14
malloc(X): size = roundup(X + sizeof(header)) roundup(Y) = 2^n, where 2^(n-1) < Y <= 2^n
familiar API simple to share buffers between kernel modules since freeing a buffer does not require knowing its size
16
no provision for coalescing free buffers since buffer sizes are generally fixed. no provision for borrowing pages from paging system although some implementations do this. no provision for returning unused buffers to page allocator
17
McKusick-Karels Allocator
Improved power of twos implementation Managed Memory must be contiguous pages All buffers within a page must be of equal size Adds page usage array, kmemsizes[], to manage pages Does not require buffer headers to indicate page size. When freeing memory, free(buff) simply masks of the lower order bit to get the page address (actually the page offset = pg) which is used as an index into the kmemsizes array.
Fred Kuhns (11/28/2012) CS523 Operating Systems
19
McKusick-Karels Allocator
Used in several UNIX variants Add an allocated page usage array (kmemsizes[]) where a page (pg) is in one of three states:
free: kmemsizes[pg] contains pointer to next free page Divided into buffers: kmemsizes[pg] == buffer size Part of a buffer Spanning multiple pages: kmemsizes[first_pg] = buffer size, where first_page is the first page of the buffer
20
McKusick-Karels Allocator
Disadvantages:
similar drawbacks to simple power-of-twos allocator vulnerable to bursty usage patterns since no provision for moving buffers between lists eliminates space wastage in common case where allocation request is a power-of-two optimizes round-up computation and eliminates it if size is known at compile time
Advantages:
21
McKusick-Karels
freelistarr[] 32 128 64 512 F 32 F ... 32 64 128 256 512 1024
Power of Twos
kmemsizes[], map of managed pages Size of blocks allocated from pages, pointer to free pages
Fred Kuhns (11/28/2012)
No buffer header vulnerable to bursty usage memory not returned to paging system
22
McKusick-Karels ExampleMacros
#define NDX(size) \ (size) > 128 ? (size) > 256 ? 4 : 3 \ : (size) > 64 ? 2 : (size) > 32 ? 1 : 0 #define MALLOC(space, cast, size, flags) \ {\ register struct freelisthdr *flh; \ if (size <= 512 && (flh = freelistarr [NDX(size)]) != NULL) { \ space = (cast)flh->next; \ flh->next = *(caddr_t *)space; \ } else \ space = (cast)malloc (size, flags); \ }
Fred Kuhns (11/28/2012) CS523 Operating Systems
23
Buddy System
Allocation scheme combining Power-of-Two allocator with free buffer coalescing
binary buddy system: simplest and most popular form. Other variants may be used by splitting buffers into four, eight or more pieces.
Approach: create small buffers by repeatedly halving a large buffer (buddy pairs) and coalescing adjacent free buffers when possible. Requests rounded up to a power of two
24
25
0
B
1023 E
D D
1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Buddy System
Advantages:
good job of coalesces adjacent free buffers easy exchange of memory with paging system
can allocate new page and split as necessary when coalescing results in a complete page, it may be returned to the paging system
Disadvantage:
performance
recursive coalescing is expensive with poor worst case performance back to back allocate and release result in alternate between splitting and coalescing the same memory
27
28
29
Lazy Coalescing
Release operation has two setps Buffers divided into classes.
Assume N buffers in a given class. N = A + L + G, where
A = number of active buffers, L = number of locally free buffers and G = number of globally free buffers lazy buffer use in steady state, coalescing not necessary. slack >= 2 reclaiming borderline consumption, coalescing needed. slack == 1 acelerated non-steady state consumption, must coalesce faster. slack==0
place buffer on free list making it locally free coalesce with buddy making it globally free
30
Lazy Coalescing
Improvement over basic buddy system steady state all lists in lazy state and no time is wasted splitting and coalescing worst case algorithm limits coalescing to no more than two buffers (two coalescing delays) shown to have an average latency 10% to 32% better than the simple buddy system greater variance and poorer worst case performance for the release routine
Fred Kuhns (11/28/2012) CS523 Operating Systems
31
33
34
35
Disadvantages
no provision for releasing part of an object gc efficiency is an issue:
slow and must complete before other system run complex and inefficient algorithm
36
Mach Zones
zone of zones
first zone last zone
struct zone
struct zone
struct zone
struct zone
struct Obj X
struct Obj Y
struct Obj X
struct Obj Y
struct Obj Y
37
38
Slab Allocator
HW Cache Utilization
Footprint portion of the hardware cache and TLB that is overwritten by the allocation itself.
large footprint can displace data in cache and TLB small footprint desirable
Due to typical buffer address distributions cache contention may occur resulting in its poor utilization slab allocator addresses this
39
front end
mbuf mbuf
Slab Organization
Coloring area free active Unused space free active active free slab struct Slab linked list
32 Byte kmem_slab
42
43
Summary
Limited space overhead (kmem_slab, free ptr) service requests quickly (remove a preinitialized object) coloring scheme for better HW utilization small footprint simple garbage collection (cost shared across multiple requests - in-use count) Increased management overhead compared to simpler methods (power of twos)
Fred Kuhns (11/28/2012) CS523 Operating Systems
44