You are on page 1of 69

Mike Willbanks | Barnes & Noble

Varnish Cache

Housekeeping
Talk
Slides will be posted after the talk.

Me
Sr. Web Architect Manager at NOOK Developer Prior MNPHP Organizer Open Source Contributor Where you can find me:
Twitter: mwillbanks G+: Mike Willbanks IRC (freenode): mwillbanks Blog: http://blog.digitalstruct.com GitHub: https://github.com/mwillbanks

Agenda
Varnish? The Good : Getting Started The Awesome : General Usage The Crazy : Advanced Usage Gotchas

Official Statement What it does General use case

WHAT IS VARNISH?

Official Statement
Varnish is a web application accelerator. You install it in front of your web application and it will speed it up significantly.

You can cache


Both dynamic and static files and contents.

A Scenario
System Status Server
Mobile apps check current status. If the system is down do we communicate? If there are problems do we communicate? The apps and mobile site rely on an API
Trouble in paradise? Few and far in between.

Req/s
700 600 500 400 300 200 100 0 Small X-Large

The Graph - AWS


14 12 10 8 Req/s 6 4 2 Small Varnish 0 Small

Peak Load

Peak Load

X-Large

Small Varnish

Time
500 450 400 350 300 250 200 150 100 50 0 Small X-Large Small Varnish 80000 70000 60000 50000 Time 40000 30000 20000 10000 0 Small

Requests

Requests

X-Large

Small Varnish

The Raw Data


Small Concurrency 10 Requests 5000 Time 438 Req/s 11.42 Peak Load 11.91 Comments X-Large 150 55558 347 58 8.44 19,442 failed requests Small Varnish 150 75000 36 585 0.35

Load Balancer

HTTP Server Cluster

Database

Traditional LAMP Stack

Load Balancer

Yes
Varnish Cache Cache Hit

No

HTTP Server Cluster

Database

LAMP + Varnish
* Varnish can act as a load balancer.

Installation General Information Default VCL

THE GOOD JUMP START

Installation
rpm --nosignature -i http://repo.varnish-cache.org/redhat/varnish-3.0/el5/ noarch/varnish-release-3.0-1.noarch.rpm yum install varnish

curl http://repo.varnish-cache.org/debian/GPG-key.txt | sudo apt-key add echo "deb http://repo.varnish-cache.org/ubuntu/ lucid varnish-3.0" | sudo tee -a /etc/apt/sources.list sudo apt-get update sudo apt-get install varnish git clone git://git.varnish-cache.org/varnish-cache cd varnish-cache sh autogen.sh ./configure make && make install

Varnish Daemon
varnishd
-a address[:port] listen for client -b address[:port] backend requests -T address[:port] administration http -s type[,options] storage type (malloc, file, persistence) -P /path/to/file PID file Many others; these are generally the most important. Generally the defaults will do with just modification of the default VCL (more on it later).

General Configuration
varnishd -a :80 \ -T localhost:6082 \ -f /path/to/default.vcl \ -s malloc,512mb Web server to listen on port 8080

Setup a backend!
backend default {
.host = 127.0.0.1 .port = 8080

So whats actually caching?


Any requests containing
GET / HEAD TTL > 0

What cause it to miss?


Cookies Authentication Headers Vary * Cache-control: private

Request

req.
vcl_recv

req. bereq.
vcl_pass vcl_miss

req. bereq.
vcl_hash

req.

req. bereq.
vcl_pipe

vcl_fetch

req. bereq. beresp.

vcl_hit

req. obj.

vcl_deliver

resp.

Response

HTTP Caching
RFC 2616 HTTP/1.1 Headers
Expiration
Cache-Control Expires

Validation
Last Modified If-Modified-Since ETag If-None-Match

TTL Priority
VCL
beresp.ttl

Headers
Cache-control: s-max-age Cache-control: max-age Expires Validation

Use Wordpress?
backend default { .host = "127.0.0.1; .port = "8080"; } sub vcl_recv { if (!(req.url ~ "wp-(login|admin)")) { unset req.http.cookie; } } sub vcl_fetch { if (!(req.url ~ "wp-(login|admin)")) { unset beresp.http.set-cookie; } }

VCL Directors Verifying VCL

THE AWESOME VCL, DIRECTORS AND MORE

VCL State Engine

Varnish Configuration Language

Each Request is Processed Separately & Independently States are Isolated but are Related Return statements exit one state and start another VCL defaults are ALWAYS appended below your own VCL

VCL can be complex, but


Two main subroutines; vcl_recv and vcl_fetch Common actions: pass, hit_for_pass, lookup, pipe, deliver Common variables: req, beresp and obj More subroutines, functions and complexity can arise dependent on condition.

Request

req.
vcl_recv

req. bereq.
vcl_pass vcl_miss

req. bereq.
vcl_hash

req.

req. bereq.
vcl_pipe

vcl_fetch

req. bereq. beresp.

vcl_hit

req. obj.

vcl_deliver

resp.

Response

VCL - Process
VCL Process vcl_init vcl_recv vcl_pipe vcl_pass vcl_hash vcl_hit vcl_miss vcl_fetch vcl_deliver vcl_error vcl_fini Description Startup routine (VCL loaded, VMOD init) Beginning of request, req is in scope Client & backend data passed unaltered Request goes to backend and not cached Creates cache hash, call hash_data for custom hashes Called when hash found in cache Called when hash not found in cache Called to fetch data from backend Called prior to delivery of response (excluding pipe) Called when an error occurs Shutdown routine (VCL unload, VMOD cleanup)

VCL Variables
Always Available now epoch time Backend Declarations .host hostname / IP .port port number Request Processing client ip & identity server ip & port req request information Backend bereq backend request beresp backend response Cached Object obj Cached object, can only change .ttl Response resp response information

VCL - Functions
VCL Function hash_data(string) regsub(string, regex, sub) regsuball(string, regex, sub) ban(expression) ban(regex) Description Adds a string to the hash input Substitution on first occurrence Substitution on all occurrences Ban all items that match expression Ban all items that match regular expression

Request

req.
vcl_recv

req. bereq.
vcl_pass vcl_miss

req. bereq.
vcl_hash

req.

req. bereq.
vcl_pipe

vcl_fetch

req. bereq. beresp.

vcl_hit

req. obj.

Walking through the noteworthy items.

vcl_deliver

resp.

DEFAULT VCL

Response

vcl_recv
Received Request Only GET & HEAD by default
Safest way to cache!

Will use HTTP cache headers. Cookies or Authentication Headers will bust out of the cache.

vcl_hash
Hash is what we look for in the cache. Default is URL + Host
Server IP used if host header was not set; in a load balanced environment ensure you set this header!

vcl_fetch
Fetch retrieves the response from the backend. No Cache if
TTL is not set or not greater than 0. Vary headers exist. Hit-For-Pass means we will cache a pass through.

Common adjustments to make.

GENERAL ADJUSTMENTS

Cache Static Content


No reason that static content should not be cached.

Remove GA Cookies
GA cookies will cause a miss; remove them prior to going to the backend.

Allow Purging
Only allow from localhost or trusted server network.

Leveraging backend servers

DIRECTORS

Directors The Types


Director Type Random Client Hash Round Robin DNS Fallback Description Picks based on random and weight. Picks based on client identity. Picks based on hash value. Goes in order and starts over Picks based on incoming DNS host, random OR round robin. Picks the first healthy server.

Director - Probing
Backend Probing Variables
.url .request .window .threshold .intial .expected_response .interval .timeout

Load Balancing
Implementing a simple varnish load balancer. Varnish does not handle SSL termination.

Grace Mode
Request already pending for update; serve grace content. Backend is unhealthy. Probes as seen earlier must be implemented.

Saint Mode
Backend may be sick for a particular piece of content Saint mode makes sure that the backend will not request the object again for a specific period of time.

Purging
The various ways of purging
varnishadm command line utility Sockets (port 6082) HTTP now that is the sexiness

Purging Examples
varnishadm -T 127.0.0.1:6082 purge req.url == "/foo/bar telnet localhost 6082 purge req.url == "/foo/bar telnet localhost 80 Response: Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. PURGE /foo/bar HTTP/1.0 Host: bacon.org curl X PURGE http://bacon.org/foo/bar

Distributed Purging
curl multi-request (in php) Use a message queue
Use workers to do the leg work for you

You will need to store a list of servers somewhere

Logging
Many times people want to log the requests to a file
By default Varnish only stores these in shared memory. Apache Style Logs
varnishncsa D a w log.txt

This will run as a daemon to log all of your requests on a separate thread.

Logging
Apache style logging using: varnishncsa -O -a -w log.txt

You likely want to ensure that your cache is: 1. Working Properly 2. Caching Effectively

VERIFY YOUR VCL

What is Varnish doing


Varnishtop will show you real time information on your system. Use -i to filter on specific tags. Use -x to exclude specific tags.

Checking Statistics
Varnishstat will give you statistics you need to know how youre doing.

ESI Edge-Side Includes Varnish Administration VMOD

THE CRAZY

ESI Edge Side Includes


ESI is a small markup language much like SSI (server side includes) to include fragments (or dynamic content for that matter). Think of it as replacing regions inside of a page as if you were using XHR (AJAX) but single threaded. Three Statements can be utilized.
esi:include Include a page esi:remove Remove content <!-- esi --> - ESI disabled, execute normally

<esi:include src="header.php" /> V a r n i s h B a c k e n d

Page Content

ESI Diagram
Varnish detects ESI, requests from backend OR checks cached state.

Using ESI
In vcl_fetch, you must set ESI to be on
set beresp.do_esi = true; Varnish refuses to parse content for ESI if it does not look like XML
This is by default; so check varnishstat and varnishlog to ensure that it is functioning like normal.

ESI Usage
<html> <head><title>Rock it with ESI</title></head> <body> <header> <esi:include src=header.php" /> </header> <section id="main">...</section> <footer></footer> </body> </html>

Embedding C in VCL
Before getting into VMOD; did you know you can embed C into the VCL for varnish? Want to do something crazy fast or leverage a C library for pre or post processing? I know youre thinking thats useless..
On to the example; and a good one from the Varnish WIKI!

Embedded C for syslog


C{ #include <syslog.h> }C sub vcl_something { C{ syslog(LOG_INFO, "Something happened at VCL line XX."); }C } # Example with using varnish variables C{ syslog(LOG_ERR, "Spurious response from backend: xid %s request %s %s \"%s\" %d \"%s\" \"%s\"", VRT_r_req_xid(sp), VRT_r_req_request(sp), VRT_GetHdr(sp, HDR_REQ, "\005host:"), VRT_r_req_url(sp), VRT_r_obj_status(sp), VRT_r_obj_response(sp), VRT_GetHdr(sp, HDR_OBJ, "\011Location:")); }C

Varnish Modules / Extensions


Taking VCL embedded C to the next level Allows you to extend varnish and create new functions You could link to libraries to provide additional functionality

VMOD - std
toupper tolower set_up_tos random log syslog fileread duration integer collect

Management Console Cache Warm up

ADMINISTERING VARNISH

Management Console
varnishadm T localhost:6062
vcl.list see all loaded configuration vcl.load load new configuration vcl.use select configuration to use vcl.discard remove configuration

Cache Warmup
Need to warm up your cache before putting a sever in the queue or load test an environment?
varnishreplay r log.txt

Having Keep-Alive off No SSL Termination No persistent cache ESI multiple fragments Cookies*

GOTCHAS

These slides will be posted to SlideShare & SpeakerDeck.


SpeakerDeck: http://speakerdeck.com/u/mwillbanks Slideshare: http://www.slideshare.net/mwillbanks Twitter: mwillbanks G+: Mike Willbanks IRC (freenode): mwillbanks Blog: http://blog.digitalstruct.com GitHub: https://github.com/mwillbanks

QUESTIONS?

You might also like