You are on page 1of 116

SPONSORED BY

Since 1994: The Original Magazine of the Linux Community NOVEMBER 2014 | ISSUE 247 | www.linuxjournal.com

BACK UP LARGE
VOLUMES OF
DATA WITH
ZBACKUP
DEPLOY A
STORAGE
SOLUTION
WITH ZERO
DOWNTIME
SHARE ADMIN
ACCESS FOR
MANY HOSTS
SECURELY +
SYSTEM
A Look at
the vtop
System Monitor

ADMINISTRATION
WATCH:
INTRODUCING ISSUE

The DevOps Mindset OVERVIEW


V

LJ247-Nov2014bu.indd 1 10/28/14 3:30 PM


NEW!
Linux Journal
eBook Series
GEEK GUIDES FREE
Down
loa
Slow Down to Speed Up: NOW d
!
Continuous Quality Assurance in a DevOps Environment
By Bill Childers
DevOps is one of the newest and largest movements in Information
Technology in the past few years. The name DevOps is a portmanteau
of “Development” and “Operations” and is meant to denote a fusion of
these two functions in a company. Whether or not your business actually
does combine the two functions, the lessons and tools learned from the
DevOps movement and attitude can be applied throughout the entire
Information Technology space. This eBook focuses on one of the key
attributes of the DevOps movement: Quality Assurance. At any point,
you should be able to release your product, code or configuration—so
long as you continue keeping your deliverables in a deployable state. This is done by “slowing
down” to include a Quality Assurance step at each point in your workflow. The sooner you catch
an error or trouble condition and fix it, the faster you can get back on track. This will lower the
amount of rework required and keep your team’s momentum going in a forward direction,
enabling your group to move on to new projects and challenges.

Build a Private Cloud for Less Than $10,000!


By Mike Diehl
This eBook presents a compelling argument as to why you should
consider re-architecting your enterprise toward a private cloud. It
outlines some of the design considerations that you need to be
aware of before implementing your own private cloud, and it
describes using the DevCloud installer in order to install OpenStack
on an Ubuntu 14 server. Finally, this eBook will familiarize you with
the features and day-to-day operations of an OpenStack-based
private cloud architecture, all for less than $10K!

DOWNLOAD NOW AT: http://linuxjournal.com/geekguides

LJ247-Nov2014bu.indd 2 10/28/14 2:49 PM


Healthy servers make
for a healthy app.
New Relic Servers ™ lets you
view and analyze critical
system metrics so that you can
make sure your application is
always in tip-top shape.

M
Get server monitoring
Y
from the app perspective:
www.newrelic.com/servers
CM

MY

CY

MY

Get visibility into:


• Server and disk capacity
• CPU, memory, and disk I/O utilization
• Processes prioritized by memory or CPU consumption
• Cloud, physical, or hybrid environments

Companies using New Relic

©2008-14 New Relic, Inc. All rights reserved.

LJ247-Nov2014bu.indd 3 10/28/14 2:50 PM


CONTENTS NOVEMBER 2014
ISSUE 247
SYSTEM ADMINISTRATION
FEATURES
64 Ideal Backups 76 High-Availability 84 Sharing
with zbackup Storage with Root Access
Snapshot backups are HA-LVM without Sharing
a snap with zbackup. Enable fault tolerance Passwords
David Barton without the high prices. Leverage the
Petros Koutoupis key-caching agent to
distribute login
authority to shared
admin accounts on
remote hosts without
having to grant
individuals an account
or a copy of the
key itself.
J.D. Baldwin

ON THE COVER
‹)HJR<W3HYNL=VS\TLZVM+H[H^P[OaIHJR\WW
‹+LWSV`H:[VYHNL:VS\[PVU^P[OALYV+V^U[PTLW
‹:OHYL(KTPU(JJLZZMVY4HU`/VZ[Z:LJ\YLS`W
‹0U[YVK\JPUN[OL+L]6WZ4PUKZL[W
‹(3VVRH[[OL][VW:`Z[LT4VUP[VYW 
Cover Image: © Can Stock Photo Inc. / patrimonio

4 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 4 10/24/14 2:15 PM


INDEPTH
96 Rethinking the
System Monitor
The command line isn’t going
away—our tools need some
visual love so we can better
diagnose problems.
James Hall

COLUMNS 24

34 Reuven M. Lerner’s
At the Forge
PostgreSQL, the NoSQL Database

42 Dave Taylor’s Work the Shell


Mad Libs for Dreams, Part II

48 Kyle Rankin’s Hack and /


Localhost DNS Cache

52 Shawn Powers’
The Open-Source Classroom
32
DevOps: Better Than the Sum
of Its Parts

108 Doc Searls’ EOF


Big Bad Data

IN EVERY ISSUE
8 Current_Issue.tar.gz
10 Letters
18 UPFRONT
32 Editors’ Choice
96
60 New Products
115 Advertisers Index

LINUX JOURNAL (ISSN 1075-3583) is published monthly by Belltown Media, Inc., 2121 Sage Road, Ste. 395, Houston, TX 77056 USA. Subscription rate is $29.50/year. Subscriptions start with the next issue.

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 5

LJ247-Nov2014bu.indd 5 10/24/14 2:15 PM


Executive Editor Jill Franklin
jill@linuxjournal.com
Senior Editor Doc Searls
doc@linuxjournal.com
Associate Editor Shawn Powers
shawn@linuxjournal.com
Art Director Garrick Antikajian
garrick@linuxjournal.com
Products Editor James Gray
newproducts@linuxjournal.com
Editor Emeritus Don Marti
dmarti@linuxjournal.com
Technical Editor Michael Baxter
mab@cruzio.com
Senior Columnist Reuven Lerner
reuven@lerner.co.il
Security Editor Mick Bauer
mick@visi.com
Hack Editor Kyle Rankin
lj@greenfly.net
Virtual Editor Bill Childers
bill.childers@linuxjournal.com

Contributing Editors
)BRAHIM (ADDAD s 2OBERT ,OVE s :ACK "ROWN s $AVE 0HILLIPS s -ARCO &IORETTI s ,UDOVIC -ARCOTTE
0AUL "ARRY s 0AUL -C+ENNEY s $AVE 4AYLOR s $IRK %LMENDORF s *USTIN 2YAN s !DAM -ONSEN

President Carlie Fairchild


publisher@linuxjournal.com

Publisher Mark Irgang


mark@linuxjournal.com

Associate Publisher John Grogan


john@linuxjournal.com

Director of Digital Experience Katherine Druckman


webmistress@linuxjournal.com

Accountant Candy Beauchamp


acct@linuxjournal.com

Linux Journal is published by, and is a registered trade name of,


Belltown Media, Inc.
PO Box 980985, Houston, TX 77098 USA

Editorial Advisory Panel


"RAD !BRAM "AILLIO s .ICK "ARONIAN s (ARI "OUKIS s 3TEVE #ASE
+ALYANA +RISHNA #HADALAVADA s "RIAN #ONNER s #ALEB 3 #ULLEN
+EIR $AVIS s -ICHAEL %AGER s .ICK &ALTYS s $ENNIS &RANKLIN &REY
6ICTOR 'REGORIO s 0HILIP *ACOB s *AY +RUIZENGA s $AVID ! ,ANE
3TEVE -ARQUEZ s $AVE -C!LLISTER s #ARSON -C$ONALD s #RAIG /DA
*EFFREY $ 0ARENT s #HARNELL 0UGSLEY s 4HOMAS 1UINLAN s -IKE 2OBERTS
+RISTIN 3HOEMAKER s #HRIS $ 3TARK s 0ATRICK 3WARTZ s *AMES 7ALKER

Advertising
E-MAIL: ads@linuxjournal.com
URL: www.linuxjournal.com/advertising
PHONE: +1 713-344-1956 ext. 2

Subscriptions
E-MAIL: subs@linuxjournal.com
URL: www.linuxjournal.com/subscribe
MAIL: PO Box 980985, Houston, TX 77098 USA

LINUX is a registered trademark of Linus Torvalds.

LJ247-Nov2014bu.indd 6 10/24/14 2:15 PM


Are you
you tired
tieredofof
dealing with
dealing proprietary
with storage?
proprietary storage? ®

9%2Ä4MHÆDCÄ2SNQ@FD
zStax StorCore from Silicon ZFS Unified Storage

-
From modest data storage needs to a multi-‐tiered production storage environment, zStax StorCore

zStax StorCore 64 zStax StorCore 104

The zStax StorCore 64 utilizes the latest in The zStax StorCore 104 is the flagship of the
dual-‐processor Intel® Xeon® platforms and fast zStax product line. With its highly available
SAS SSDs for caching. The zStax StorCore 64 configurations and scalable architecture, the
platform is perfect for: zStax StorCore 104 platform is ideal for:

‡VPDOOPHGLXPRIILFHILOHVHUYHUV ‡EDFNHQGVWRUDJHIRUYLUWXDOL]HGHQYLURQPHQWV
‡VWUHDPLQJYLGHRKRVWV ‡PLVVLRQFULWLFDOGDWDEDVHDSSOLFDWLRQV
‡VPDOOGDWDDUFKLYHV ‡DOZD\VDYDLODEOHDFWLYHDUFKLYHV

TalkTalk
withwith
an an
expert today:
expert today:866-‐352-‐1173
866-‐352-‐1173 -‐ http://www.siliconmechanics.com/zstax

LJ247-Nov2014bu.indd 7 10/24/14 2:15 PM


Current_Issue.tar.gz

Folger’s SHAWN POWERS

Crystals
E
very time I write a Bash script or you’ll want to read his column for more
schedule a cron job, I worry about details. Dave Taylor follows up with part
the day I’ll star in my very own IT two in his series on a script-based dream
version of a Folger’s commercial. Instead interpreter. You also will learn a few
of “secretly replacing coffee with Folger’s handy scripting tips along the way that
Instant Crystals”, however, I worry I’ll be will be useful regardless of the project
replaced by an automation framework you’re creating. When the series is done,
and a few crafty FOR loops. If you’ve perhaps the dream interpreter can help
ever had nightmares like that, you’re in me figure out my recurring nightmare of
the right place. The truth is, the need for being Winnie the Pooh being hunted by
system administrators isn’t going down— angry bees. Or maybe I should just lay
it’s just that our job function is shifting a off the honey for my tea.
little. If you stay current, and resolve to be Kyle Rankin discusses DNS this month,
a lifelong learner, system administration but instead of setting up DNSSEC, he
is as incredible as it’s always been. describes how to set up DNS caches
(And far better than instant coffee! to make your networks more efficient.
Yuck!) This month, we focus on system While Kyle doesn’t normally care for
administration. It keeps us all relevant, all dnsmasq as a DNS/DHCP dæmon, in
informed and most important, we should this article, he turns to it for its caching
all learn a little something along the way. abilities. If you see your internal DNS
Reuven M. Lerner starts off the servers getting hammered, a caching
issue discussing the power of NoSQL situation might be just what you need.
databases using PostgreSQL. If that I follow Kyle with an article on DevOps.
seems like a contradiction in terms, Although it makes sense to start a series
on DevOps tools like Chef, Puppet,
VIDEO: Ansible and so on, the first thing to do
V

Shawn Powers runs is understand what DevOps is all about.


through the latest issue.
If you ask six people to define DevOps,

8 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 8 10/24/14 2:15 PM


CURRENT_ISSUE.TAR.GZ

you’ll get a dozen different answers. This an elegant way to manage those local
month, I try to clear up any confusion or SSH accounts by leveraging ssh-agent
misconceptions, so that you can benefit to store authentication keys. If you’ve
from the DevOps idea rather than be ever had to change the password on 100
scared or confused by it. servers because an employee left the
No system administration issue company, you understand the problem.
would be complete without addressing J.D. has an awesome solution.
the most important issue facing the Finally, James Hall describes the
sysadmin. No, it’s not uptime—it’s process he used when developing the
backups. David Barton walks through incredibly cool vtop program. A graphical
using zbackup, which is a deduplicating activity monitor for the command line
system for backing up massive amounts alone is a enough to warrant an article,
of data as efficiently as possible. When but James covers his entire process from
a single desktop hard drive can hold 4TB planning to creating to improving. Even
of data or more, the task of backing up if you have no interest in using a CLI
becomes monumental. David makes it a GUI-based program, it’s a great way to
little more manageable. learn about the open-source journey. If
Petros Koutoupis follows David with you want to see how an idea becomes a
that other topic sysadmins are concerned package, James’ article is perfect.
with: uptime. Migrating data from one We’ve also got a ton of tips, tricks,
system to another often is expensive and hints and pointers throughout the issue.
time-consuming, and it usually means If you want to hear about the latest
proprietary SAN storage. With the advent product announcements, or just discover
of High-Availability Logical Volume our favorite app of the month, this issue
Management (HA-LVM), that same aims to please. Whether you’re a system
flexibility comes to folks using commodity administrator, a developer or even a new
hardware and open-source tools. Petros Linux user getting your open-source feet
explains the concept and process for wet, we hope you enjoy this issue as
creating and maintaining highly available much as we enjoyed putting it together.Q
LVM solutions for the data center.
System administrators know that Shawn Powers is the Associate Editor for Linux Journal .
although central authentication is a key He’s also the Gadget Guy for LinuxJournal.com, and he has
to a successful network infrastructure, an interesting collection of vintage Garfield coffee mugs.
there also are local accounts on servers Don’t let his silly hairdo fool you, he’s a pretty ordinary guy
and devices that must be kept local, and can be reached via e-mail at shawn@linuxjournal.com.
and yet still used. J.D. Baldwin shows Or, swing by the #linuxjournal IRC channel on Freenode.net.

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 9

LJ247-Nov2014bu.indd 9 10/24/14 2:15 PM


letters
Mosh (https://mosh.mit.edu), which
comes in handy over slow mobile
connections, dynamic resize of the
screen area and so on.

I also have a foldable full-sized


QWERTY keyboard and a micro USB
to USB female OTG adapter cable so
I do not have to use the onscreen
keyboard if I know I will have to type
a lot, but rather a proper keyboard.
—Balazs

Federico Kereki replies: Mr Kinszler


JuiceSSH Instead of ConnectBot is almost right as to ConnectBot.
I have been a Linux user for almost Although there were no updates for
two decades, but it was a good idea a long time, work on it has seemingly
to subscribe to LJ more than a year restarted, a pre-release version is
ago. I keep finding lots of good available at https://github.com/
articles in every issue. connectbot/connectbot/releases,
and a new release may be available
In your July 2014 issue in the “Remote soon. As to his suggestion regarding
System Administration with Android” JuiceSSH, I haven’t tried it out, but I
article, Federico Kereki suggests will; it seems to be a valid alternative.
ConnectBot for SSH on Android, Finally, I also agree with him that a
although as far as I know, it is no external keyboard is best. Personally,
longer maintained. There is a better I own a rollable Bluetooth keyboard
alternative called JuiceSSH. that does the job well.

Although it’s not completely free (but Extremist Interpretations


neither do we work for free, right?), Regarding Doc Searls’ “Stuff That
it is well worth the money for the Matters” article in the September 2014
features it supports—for example, issue: isn’t it possible that the word

10 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 10 10/24/14 2:15 PM


[ LETTERS ]

“extremist(s)” in the NSA’s surveillance Finding Leap Year


system XKEYSCORE is ambiguous? For Dave Taylor, here’s yet another
Isn’t it possible, or even likely, that approach. Just implement the basic
the word “some” is implied in the rule: a leap year is any year divisible
phrase “advocated by extremists”, as in by four, except that divisible by 100 is
“advocated by some extremists?” To me, not, except that divisible by 400 is:
it seems far more likely that the phrase
is an example of the logical argument $ cat leapyear

“All bad-guys are extremists, but not yr=$1

all extremists are bad-guys”, than it is if [ "$yr" -le 0 ]

leap to the nefarious conclusion that then echo "usage: leapyear year"

Cory Doctorow describes in his article as exit 1

“nothing short of bizarre”. fi

Kyle Rankin’s own article affirms that if [ $yr -eq $(expr $yr / 4 '*' 4) ]

being flagged as “extremist” targets then if [ $yr -eq $(expr $yr / 100 '*' 100) ]

you for further surveillance. This implies then if [ $yr -eq $(expr $yr / 400 '*' 400) ]

that the system is in a narrowing down then echo $yr is a leap year

phase and hasn’t yet concluded you’re else echo $yr is not a leap year

a bad guy. For example, the September fi

11th hijackers were identified from a set else echo $yr is a leap year

of all airline passengers. Indeed, all the fi

September 11th hijackers were airline else echo $yr is not a leap year

passengers, but not all airline passengers fi

were hijackers.
—Bill Rausch
I believe this article is premature in its
conclusions and is out of character for Dave Taylor replies: Nice, simple
the intellectually honest work I’ve come straightforward code. Thanks for
to expect from Doc. So much so, that sending that in, Bill!
I believe this conclusion was probably
reached to generate letters to the Calculating Days
editor. I do freely admit however, that Calculating days between dates is
mine is probably an extremist opinion. more complicated than you would
—Jon Redinger expect at first sight. First, you must

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 11

LJ247-Nov2014bu.indd 11 10/24/14 2:15 PM


[ LETTERS ]

ask yourself the question: “Where is not a connection problem. Have


am I? Europe, Asia, America....” any of your readers reported this
before? Does anyone know how to
Second, “which country?” Because all fix this? Is Microsoft deliberately
of this has an influence on the date. blocking the kernel.org site because
it is a competing OS?
Third, “which calendar am I using?” —Mike

The most accurate calendar is the I suspect it’s not an intentional


Mayan calendar, not the Gregorian blocking of the kernel.org Web
or western-known types of time site, but probably some outdated
tracking. So I suggest you start from implementation of something (DNS?
this calendar and first convert the SSL cert?) that makes the sites
date to the Mayan calendar, and then inaccessible. Since it’s XP, and it’s
calculate these days between dates, Microsoft, I would be shocked if it
and then convert to normal dates. was something that ever gets fixed.
This will avoid a lot of mistakes due I don’t have XP to test it, but it
to “modern calendars”. wouldn’t surprise me if it failed on
every XP install. If you’re stuck using
For example, the Thai calendar is, for that computer, maybe try a different
us, in the future. In Europe, we have browser? Or a proxy? Otherwise, I do
known a lot of calendars in different know a pretty awesome operating
periods and places. system you could try.—Shawn Powers
—Patrick Op de Beeck
NSA’s Extremist Forum
Windows XP IE 8 and kernel.org Do not believe everything you read
I am not sure if anyone at LJ or any of in the papers—ref: “you could be
your readers can answer this. When I try fingerprinted”. I would almost
to go to the http://www.kernel.org guarantee there are NSA employees
Web site using Win XP and IE 8, it (*nix sysadmins, security managers,
says the page can’t be displayed. I etc.) who subscribe to this journal
tried to find something in XP that because it is a quality source of
would block the site, but had no luck. information. They are not “extremist”,
I can access the site from all other and I can only assume that they
systems on my network, so I know it are not forbidden from reading this

12 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 12 10/24/14 2:15 PM


[ LETTERS ]

journal. Perhaps the NSA may contain Linux Digital Microscope


some people like you and me. I would like to connect a USB digital
—Jon microscope to my Linux laptop to
show my children all the tiny life that
Oh, I’m sure there are some fellow is in pond water.
Linux nerds that work for the
NSA, and most likely, some read Do you have any ideas?
Linux Journal. Being flagged as —Pat
possible extremists was likely either
automated based on our encryption I think many USB microscopes function
articles, or manually due to the like a Webcam, so if you can figure
same sort of thing. Although I find out whether it’s UVC-compatible, you
it frustrating to be considered a should be able to tell if it will work
potential threat, I’m trying to take it with Linux. I think model numbers and
as a compliment. Open-source ideals Google will be your best friends in this
are radical, and we are a passionate endeavor.—Shawn Powers
bunch. Does that make us suspect?
Even if it does, I have no desire to Dave Taylor’s September 2014 Article
change!—Shawn Powers I have two items to mention regarding
Dave Taylor’s article on calculating
DNSMasq days between dates.
Thanks for Shawn Powers’ excellent
article on DNSMasq in the September 1) There is a bug in the code found
2014 issue. I have a couple small within the “Days Left in Year”
networks where bind has always been section. The output of the date
an overkill. DNSMasq is great, as is command pads the results with zeros
loading data directly from /etc/hosts. by default. In bash, a numeric value
—Richard Ruth that begins with a zero is evaluated
as an octal (base 8) number. This is
Thanks Richard! I was surprised just demonstrated by the script output:
how powerful it can be. I don’t know
why I never considered it outside of Calculating 366 - 035

an embedded router, but I’ve been There were 337 days left in the starting year

incredibly impressed with it. I’m glad you


find it useful as well.—Shawn Powers 366 – 35 = 331—not 337.

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 13

LJ247-Nov2014bu.indd 13 10/24/14 2:15 PM


[ LETTERS ]

If 035 is treated as octal, 035 = 29 let doubleCheck=${elapsedSeconds}-${elapsedDays}*60*60*24

(base 10), 366 – 29 = 337 (as reported if [ "${doubleCheck}x" != "0x" ] ; then

by the script output quoted above). echo "double check failed: ${doubleCheck}"

fi

The leading zeros can be removed by


using a “-” in the output specification The +%s format gives a positive
for the date command. For instance: value for dates after Jan 1, 1970,
and a negative value for dates prior
$ date -d '01/01/2010' +%-j to Jan 1, 1970.
1
Using this approach, there is no need
2) I would suggest an alternate to calculate any “from beginning”
approach to the entire script. Please or “to end” of year blocks nor worry
ignore this if you have considered about leap years.
and rejected this approach for some
reason. I saw this article alone and no However, there is an issue. In a
prior articles in the series. normal day, there are 24*60*60
seconds. The date command reports
Use the seconds-since-epoch output that March 9, 2014, has only 23
format. For instance: hours. From my shell:

#!/bin/bash $ ./daysbetween.bash 2014-03-10 2014-03-09

# daysbetween.bash 2014-03-10 through 2014-03-09

elapsed seconds: 82800

dateEnd=${1} elapsed days: 0

dateStart=${2} double check failed: 82800

echo ${dateEnd} through ${dateStart} $ date --version

date (GNU coreutils) 8.23

epochSecondsEnd=$( date -d "${dateEnd}" +%s ) ...

epochSecondsStart=$( date -d "${dateStart}" +%s )

let elapsedSeconds=${epochSecondsEnd}-${epochSecondsStart} A 24-hour day contains: 24*60*60


echo elapsed seconds: ${elapsedSeconds} = 86400 seconds. A 23-hour day
let elapsedDays=${elapsedSeconds}/60/60/24 contains: 23*60*60 = 82800 seconds.
echo elapsed days: ${elapsedDays}

As you can see from the output, the

14 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 14 10/24/14 2:15 PM


[ LETTERS ]

date command reports that March 9 read epochSecondsStart startTZ <

had 82800 seconds, or 23 hours. ´<(date -d "TZ=\"GMT\" ${dateStart}" +%s)

echo "${epochSecondsStart}"

I do not know if this is a bug


in the date command, or if it’s let elapsedSeconds=${epochSecondsEnd}-${epochSecondsStart}

intentional because of some date- echo elapsed seconds: ${elapsedSeconds}

adjusting convention (such as an


hour equivalent of a leap year), or let elapsedDays=${elapsedSeconds}/60/60/24

something else. echo elapsed days: ${elapsedDays}

All other tests I performed raised let doubleCheck=${elapsedSeconds}-${elapsedDays}*24*60*60

no errors. if [ "${doubleCheck}x" != "0x" ] ; then

echo "double check failed: ${doubleCheck}" >&2

Addendum to the seconds-since- fi

epoch approach:
—Chris
The 23-hour day corresponds with
daylight-savings time. “Spring Doc Searls’ “Stuff That Matters”
forward” gives a 23-hour day, and Doc Searls’ article in the September
“fall back” gives a 25-hour day. 2014 issue about privacy on the
Net and targeting by government’s
This problem can be avoided by security agencies is interesting,
switching to GMT. A modified script is but it seems to be a bit naïve,
presented below: especially compared to what the
author actually writes about, giving
#!/bin/bash the examples of Israel and London’s
# daysbetween.bash effective security systems.

dateEnd=${1} The author doesn’t seem to


dateStart=${2} understand that such privacy protector
echo ${dateStart} through ${dateEnd} systems like TOR can be used not only
to protect the privacy of normal and
epochSecondsEnd=$( date -d "TZ=\"GMT\" ${dateEnd}" +%s ) perfectly honest citizens but also by
echo "${epochSecondsEnd}" terrorist and criminal organizations
around the world, so I guess that’s

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 15

LJ247-Nov2014bu.indd 15 10/24/14 2:15 PM


[ LETTERS ]

perfectly understandable that a security agency


would be interested in monitoring and tracking
them and their users. At least, if someone has
At Your Service
nothing to hide, he’s not in danger; security SUBSCRIPTIONS: Linux Journal is available
in a variety of digital formats, including PDF,
operators are experienced enough to understand .epub, .mobi and an on-line digital edition,
as well as apps for iOS and Android devices.
whether they are tracking an employee who Renewing your subscription, changing your
e-mail address for issue delivery, paying your
protects his privacy or a terrorist who wants to invoice, viewing your account details or other

blow up a building. subscription inquiries can be done instantly


on-line: http://www.linuxjournal.com/subs.
—Walter E-mail us at subs@linuxjournal.com or reach
us via postal mail at Linux Journal, PO Box
980985, Houston, TX 77098 USA. Please
remember to include your complete name
Doc Searls replies: I think you missed my point, and address when contacting us.

which is that we are in the earliest days of ACCESSING THE DIGITAL ARCHIVE:

personal privacy technology development in the Your monthly download notifications


will have links to the various formats
on-line world. To get to that point, I borrowed and to the digital archive. To access the
digital archive at any time, log in at
interest in actual attacks going on in the real http://www.linuxjournal.com/digital.

world at the time, including rockets pointed at LETTERS TO THE EDITOR: We welcome your
letters and encourage you to submit them
my head in Israel and the NSA flagging Linux at http://www.linuxjournal.com/contact or

Journal readers as terrorism suspects. mail them to Linux Journal, PO Box 980985,
Houston, TX 77098 USA. Letters may be
edited for space and clarity.

If we all had “nothing to hide”, we wouldn’t WRITING FOR US: We always are looking
for contributed articles, tutorials and
wear clothing. And really, how many of us real-world stories for the magazine.
An author’s guide, a list of topics and
trust the world’s “security operators” to protect due dates can be found on-line:
http://www.linuxjournal.com/author.
our privacy? The ones at the NSA sure failed in
our own case. FREE e-NEWSLETTERS: Linux Journal
editors publish newsletters on both
a weekly and monthly basis. Receive
late-breaking news, technical tips and
tricks, an inside look at upcoming issues
and links to in-depth stories featured on
http://www.linuxjournal.com. Subscribe
WRITE LJ A LETTER for free today: http://www.linuxjournal.com/

We love hearing from our readers. Please enewsletters.

send us your comments and feedback via ADVERTISING: Linux Journal is a great
resource for readers and advertisers alike.
http://www.linuxjournal.com/contact. Request a media kit, view our current
editorial calendar and advertising due dates,
or learn more about other advertising
and marketing opportunities by visiting
PHOTO OF THE MONTH us on-line: http://ww.linuxjournal.com/
advertising. Contact us directly for further
Remember, send your Linux-related photos to information: ads@linuxjournal.com or
ljeditor@linuxjournal.com! +1 713-344-1956 ext. 2.

16 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 16 10/24/14 2:15 PM


Bye-bye bottlenecks.
Hello happy users.

New Relic APM ™ provides


deep visibility into your
app’s performance, from
the end-user experience
down to the line of code.
C

Make your app faster today


CM

MY
www.newrelic.com/apm
CY

MY

Companies using New Relic

©2008-14 New Relic, Inc. All rights reserved.

LJ247-Nov2014bu.indd 17 10/28/14 3:02 PM


UPFRONT NEWS + FUN

diff -u
WHAT’S NEW IN KERNEL DEVELOPMENT
Hardware errors are tough to defer to the more pressing CPU
code for. In some cases, they’re issue or not?
impossible to code for. A particular There was a bit of debate, but
brand of hardware error is the ultimately Linus Torvalds said that
Machine-Check Exception (MCE), an MCE meant that the system was
which means a CPU has a problem. dead. Any attempt to handle that in
On Windows systems, it’s one of the software, he said, was just in order
causes of the Blue Screen of Death. to crash as gracefully as possible.
Everyone wants to handle But he felt that the kernel should
hardware errors well, because it not make any complicated effort in
can mean the difference between that case, since the end result would
getting a little indication of what just be the same crash. Deadlocks,
actually went wrong and getting no race conditions and other issues that
information at all. normally would be important, simply
Andy Lutomirski recently weren’t in this case. Make a best
suggested some code to clean up effort to log the event, he said, and
non-maskable interrupts (NMIs), forget the rest.
which also typically indicate some Elsewhere, he elaborated more
sort of hardware failure. But over vociferously, saying, “MCE is frankly
the course of discussion, folks misdesigned. It’s a piece of shit,
raised questions about how to and any of the hardware designers
handle various cases—for example, that claim that what they do is for
when an MCE came immediately system stability are out to lunch.
after an NMI. Typically NMIs are This is a prime example of what not
not interruptable by any other to do, and how you can actually
code, but should an exception be spread what was potentially a
made for MCEs? If the OS detects a localized and recoverable error, and
CPU error while already processing make it global and unrecoverable.”
another hardware error, should it And he added:

18 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 18 10/24/14 2:15 PM


[ UPFRONT ]

Synchronous MCEs are fine Another group believes that certain


for synchronous errors, but abilities render the container
then trying to turn them inherently insecure. The first
“synchronous” for other CPUs group says that without these
(where they weren’t synchronous features, the container isn’t truly
errors) is a major mistake. offering a complete environment.
External errors punching through The second group says that’s how
irq context is wrong, punching the cookie crumbles.
through NMI is just inexcusable. Seth Forshee recently
posted some patches to allow
If the OS then decides to take containerized systems to see
down the whole machine, the hot-plugged devices, just the way
OS—not the hardware—can a non-containerized system could.
choose to do something that will But this, apparently, was a bridge
punch through other CPUs’ NMI too far. Greg Kroah-Hartman said
blocking (notably, init/reset), but he had long since expressed a clear
the hardware doing this on its policy against adding namespaces
own is just broken if true. to devices. And, that was exactly
how Seth’s code made the hot-
Tony Luck pointed out that Intel plugged devices visible to the
actually was planning to fix this containerized system.
in future machines, although he It turns out that there are
acknowledged that turn-around valid use-cases for wanting a
time for chips was likely to be very containerized system to be able to
long. However, as Borislav Petkov see hot-plugged devices. Michael
pointed out, even after the fix went H. Warfield described one such.
in, Linux still would need to support And, Seth described his own—he
the bad hardware. needed hot-plug support in order
The tightrope-walk of container to implement loopback devices
security has some controversy. within the container.
One group believes that containers Greg said loopback support in a
should be able to do whatever container was a very bad idea, since
an independent system could do. it provided all sorts of opportunities

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 19

LJ247-Nov2014bu.indd 19 10/24/14 2:15 PM


[ UPFRONT ]

to leak data out of the container and into the


host system—a security violation. They Said It
He said this was not a “normal” use-case
for containers. To which Serge Hallyn replied Everything that
is really great
that any feature used by a non-containerized
and inspiring is
system was a “normal” use case for created by the
containerized systems. individual who
Serge argued that these features inevitably can labor in
would go into containers. There was no way freedom.
—Albert Einstein
to keep them out. As long as containers
excluded features that were included in Any government
non-containerized systems, there would be is potentially the
worst client in the
people with an incentive to bridge the gap.
world you could
Why not bridge it now and fix the bugs as they ever possibly
showed up? want to have.
But Richard said, “There are so many things —Thomas
that can hurt you badly. W ith user namespaces, Heatherwick
we expose a really big attack surface to regular When you give
users. [...] I agree that user namespaces are each other
the way to go, all the papering with LSM over everything,
security issues is much worse. But we have to it becomes an
even trade.
make sure that we don’t add too many features Each wins all.
too fast.” —Lois McMaster
And, Greg added that Seth’s code was too Bujold
hacky, implementing just what Seth needed,
Enjoyment is
rather than addressing the overarching issue not a goal, it is
of how to handle namespaces properly within a feeling that
a container. accompanies
Greg also said he supported loopback devices important
ongoing activity.
within containers, but he and James Bottomley —Paul Goodman
said that the security issues were real, and the
implementation had to take account of them. It I must create
wasn’t enough simply to implement the feature a system, or
be enslaved by
and then fix bugs. The feature needed a proper another man’s.
design that addressed these concerns. —ZACK BROWN —William Blake

20 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 20 10/24/14 2:15 PM


[ UPFRONT ]

Android Candy: Party


Like It’s 1994!
I really stink at video games. I write
about gaming occasionally, but the
truth of the matter is, I’m just not
very good. If we play Quake, you’ll
frag me just about as often as I
respawn. I don’t have great reflexes,
and my coordination is horrible. But
if you give me an RPG, a 12-pack of
Coke, and a three-day weekend, I’ll
be a level 96 blood elf by dawn of
the second day. It was the best game on the
Yes, in my youth I was a bit of Super Nintendo, and I think it’s
a nerd. I stayed home weekends the best game on Android as
playing Chrono-trigger, The Secret well. Of course, if you’re okay
of Mana, Zelda, Dragon Warrior with slightly more awkward
and, of course, Final Fantasy. I gameplay, the old titles are
was happy to discover the other easy to find in ROM format in
day that those same Final Fantasy the questionable corners of the
games I loved as a youngster are Internet. There are several really
available in all their remade glory good SNES emulators for Android
on the Android platform! They are that will allow you to play those
unfortunately a little pricey, with original ROM files completely free.
each installment weighing in at Honestly, however, if you can afford
$15.99, but they’ve been re-created the $15.99, the remakes are far
specifically for the touch screen, more enjoyable to play.
and they are really fun! Check them out on the Google Play
If you wonder which game to Store: https://play.google.com/store/
buy (and you don’t plan to buy apps/developer?id=SQUARE%20
them all, like some of us did), I ENIX%20Co.%2CLtd.&hl=en.
highly recommend Final Fantasy VI. —SHAWN POWERS

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 21

LJ247-Nov2014bu.indd 21 10/24/14 2:15 PM


[ UPFRONT ]

Non-Linux FOSS
One of my career iterations It’s basically just a single
put me in charge of a installer for W indows, OS X
W indows server that had or Linux that installs Apache
Apache and PHP installed on with PHP and MySQL. Its
it to serve as a Web server maturity means that even
for the corporate intranet. on a W indows system, it
Although I was happy to should install and work like
see Apache used as the Web you’d expect open-source
server dæmon, the installation software to work.
on the W indows server was Although XAMMP can
the most confusing and be used to serve files to
horrifying mess I’ve ever seen. the actual Internet, it was
To this day, I’m not sure which designed for individuals
of the three Apache instances to install on their own
was actually serving files, and workstations to test their
there were at least six PHP code. And in that situation,
folders in various places on it works really well. If you
the hard drive, each with a have a server connected to
different version number. the Internet, I still recommend
If you’re in a situation using a Linux server with
where you’re required to a proper Apache/PHP
use W indows, but don’t installation, but if you’re
want to worry about the stuck using a W indows
nightmare of installing Apache workstation, XAMMP can
and PHP (much less MySQL) give you a stable, open-source
on your machine, I urge you Web server platform that you
to check out XAMMP. It’s can rely on. Grab a copy at
not a new program, but that’s http://www.apachefriends.org.
one of its greatest features. —SHAWN POWERS

22 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 22 10/24/14 2:15 PM


Instant Access to Premium
Online Drupal Training
Instant access to hundreds of hours of Drupal
training with new videos added every week!

Learn from industry experts with real world


H[SHULHQFHEXLOGLQJKLJKSURȴOHVLWHV

Learn on the go wherever you are with apps


for iOS, Android & Roku

We also offer group accounts. Give your


whole team access at a discounted rate!

Learn about our latest video releases and


RIIHUVȴUVWE\IROORZLQJXVRQ)DFHERRNDQG
7ZLWWHU #GUXSDOL]HPH 

Go to http://drupalize.me and
get Drupalized today!

LJ247-Nov2014bu.indd 23 10/24/14 2:15 PM


[ UPFRONT ]

Drafting on Linux
One common scientific task is package in most distributions.
designing new hardware to help In Debian-based distributions, you
make measurements. A powerful can install it with the command:
tool to help with this design work
is a Computer Aided Design system, sudo apt-get install librecad
or CAD software. Several tools are
available on Linux for doing CAD And, you always can install it from
work. In this article, I take a look at source if you want the latest and
LibreCAD (http://www.librecad.org). greatest features.
LibreCAD started as an extension of Once LibreCAD is installed, you
QCad. For a short while, it was called can launch it from the application
CADuntu, before finally being named launcher for your desktop, or you
LibreCAD. It should be available as a can run the librecad command

Figure 1. When you start LibreCAD the first time, you need to set some initial options.

24 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 24 10/24/14 2:15 PM


[ UPFRONT ]

from a terminal. The first time you you will see two smaller windows
start LibreCAD, you will be greeted containing the layer list and the
with a welcome window (Figure 1). block list for your design.
Here, you will be presented with If you already have done some
the ability to set options for the design work, you can import that
default unit, the GUI language and work into LibreCAD. You can insert an
the command language. Once you image to your design by clicking the
set those options, you will see a menu item FileAImportAInsert Image.
blank canvas where you can start LibreCAD can handle most common
your new project (Figure 2). The file formats. If you had been working
main window is the actual drawing with another CAD program and have
canvas where you can set up your a DXF file of that work, you can
design. On the left-hand side, you import it by clicking on the menu item
should see a palette of drawing FileAImportABlock (Figure 3). This
tools. On the right-hand side, option also handles CXF files, in case

Figure 2. LibreCAD starts up with a blank canvas, ready for your new project.

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 25

LJ247-Nov2014bu.indd 25 10/24/14 2:15 PM


[ UPFRONT ]

Figure 3. You can import DXF files from lots of places.

you were using those.


You may have a text file with raw
point data for the object you are
trying to draw. If so, you can click on
the menu item FileAImportARead
ascii points. This will pop up an option
window where you can define what
the points represent and how to treat
them. You even can import GIS data
from a shape file with the menu item
FileAImportAshape file.
Now you should be ready to start
designing your project. Clicking
the icons in the palette on the left- Figure 4. You can set options for new
hand side opens a new palette with layers added to your project.

26 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 26 10/24/14 2:15 PM


[ UPFRONT ]

multiple options for each of the drawing many other components,


categories. For example, if you click such as lines, arcs and splines. All
on the circle icon, you will see a of these items are drawn on the
new palette giving you the option to default layer that you get with a new
draw circles with either two points project. You can add a new layer by
on the circumference, a point at the clicking the plus icon in the top pane
center and one at the circumference on the right-hand side. This will pop
or a circle that fits within a triangle, up a new option window where you
among many other options. can set things like the layer name
The other icons in the drawing and the drawing color for the new
palette also provide tools for layer (Figure 4).

Figure 5. You can set several options when you add a multi-line text object.

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 27

LJ247-Nov2014bu.indd 27 10/24/14 2:15 PM


[ UPFRONT ]

You can toggle visibility of the pop up a window where you can
various layers by clicking the eye enter the text and set options like
icon to the right in the layer list. font and color (Figure 5).
When you have a layer set the Once you have the basic objects
way you want it, you can make it drawn on your project, you can
uneditable by clicking on the lock use the many modification tools
icon for that layer. That way, you available to fine-tune your drawing
won’t accidentally change it while or to generate more complex
you work on other layers. objects based on some modification
If you need to add labels of one of the basic types. These
explaining parts of your design, modifications are available under
you can click on the multi-line text the Modify menu item. You can
option in the tool palette. This will do things like scaling, mirroring,

Figure 6. You can set several options for scaling an element of your drawing.

28 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 28 10/24/14 2:15 PM


[ UPFRONT ]

rotating and moving. act as a point of perspective, so go


Using these tools, however, isn’t ahead and click on a point. This will
very intuitive. Say you want to scale pop up an option window where you
an element of your drawing. The can set things like the scaling factor
first thing you will want to do is to and whether it is isotropic (Figure 6).
click on the ModifyAScale menu When you click OK, LibreCAD will
item. You next will notice that the apply the modification and refresh
command-line box at the bottom of the drawing.
the window has changed, asking you You also can change the properties
to “Select to scale”. You then need of the different elements, which may
to click on the element you want to be needed to clarify parts of special
scale, say a line element, and press interest. To do this, you need to click
the Enter key. The command-line on the ModifyAProperties menu
window then will change to saying item, and then click on the element in
“Specify reference point”. LibreCAD question. This will pop up a dialog box
scales based on a reference point to where you can edit properties that

Figure 7. You can change both the display properties of your circle as well as the
physical properties.

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 29

LJ247-Nov2014bu.indd 29 10/24/14 2:15 PM


[ UPFRONT ]

apply for that element type (Figure 7). with most people.
When you have finished your Hopefully, this article has shown
design, you will want to share it with you enough to help you decide
others. The default file format is the whether LibreCAD might be a good
Drawing Exchange Format (.dxf). fit for your next design project. If so,
LibreCAD supports versions 2007, you can find even more information
2004, 2000, R14 and R12. If you need on the LibreCAD Wiki and forum. A
to, you also can save it as an LFF Font great deal of examples are available
file (.lff), a QCad file (.cxf) or a Jww on the Internet that will show you
Drawing file (.jww). If you just want just what is possible with a good
a simplified output, you can click on CAD system. And, because these
the FileAExport menu item and save examples are available in DXF files,
it in one of a large number of image you can load them in LibreCAD and
file formats. With these options, you play with the possibilities.
should be able to share your design —JOEY BERNARD

LINUX JOURNAL
now available
for the iPad and
iPhone at the
App Store.

linuxjournal.com/ios
For more information about advertising opportunities within Linux Journal iPhone, iPad and
Android apps, contact John Grogan at +1-713-344-1956 x2 or ads@linuxjournal.com.

LJ247-Nov2014bu.indd 30 10/24/14 2:15 PM


[ UPFRONT ]

The Awesome Program


You Never Should Use

I’ve been hesitating for a couple the help screen in the screenshot),
months about whether to mention and it injects them into your ssh
sshpass. Conceptually, it’s a horrible, (or scp ) command.
horrible program. It basically allows Again, this is a horribly insecure
you to enter an SSH user name and method for entering passwords.
password on the command line, so However, I find it particularly useful
you can create a connection without for setting up new machines,
any interaction. A far better way to especially computers or devices in a
accomplish that is with public/private closed environment. I’ve also used
keypairs. But it’s still something I find it to send files via scp to hundreds
useful from time to time, and I’d rather of machines in my local network
mention it with all the warnings in the that I’ll never need to connect to
world than to pretend it doesn’t exist. again. It’s a dangerous tool, but can
So, sshpass—it’s a simple tool, be a lifesaver if you need it. Search
but in a pinch, it can be incredibly your distribution’s repositories, as
helpful. You use it with the user it’s available for most systems. And
name and password as command-line remember, don’t ever use it!
arguments (with some variations, see —SHAWN POWERS

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 31

LJ247-Nov2014bu.indd 31 10/24/14 2:15 PM


[ EDITORS' CHOICE ]

Tomahawk, the EDITORS’


CHOICE
World Is Your ★
Music Collection
I don’t listen to music very often, but enough—I’ve written about it in past
when I do, my tastes tend to be across issues). Unfortunately, with Pandora, you
the board. That’s one of the reasons I don’t get to pick specific songs. That’s
really like Pandora, because the music usually okay for me, but sometimes
selection is incredible (in fact, I can’t I want to hear a particular song by a
recommend the Pithos client for Pandora particular artist. Even worse, sometimes

32 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 32 10/24/14 2:15 PM


I want to hear a particular version of Subsonic and tons of other sources.
a song. I’ve purchased 3–4 different Of particular note, I love that there is a
versions of a song, only to discover none YouTube plugin that will search YouTube
of them were what I wanted. for songs! (The YouTube plugin isn’t
Enter Tomahawk. It behaves much included by default, but it’s free to install.)
like a traditional music application, and Due to its ability to blur the lines
it will play music from your hard drive between local and streaming media,
or network shares. Its real strength, while functioning as a traditional
however, is its ability to connect to desktop music app, Tomahawk earns
on-line resources to find songs. When this month’s Editors’ Choice award.
it finds those songs, it treats them just If you have fickle music tastes, or
like a local file. You can create playlists just want to listen to your various
with a mix of local and remote media, music collections in a central place,
and search across an entire array of I urge you to give Tomahawk a try:
on-line services. Tomahawk will connect http://www.tomahawk-player.org.
to Spotify, last.fm, Jamendo, Beets, —SHAWN POWERS

The White Paper Library


on LinuxJournal.com

www.linuxjournal.com/whitepapers

LJ247-Nov2014bu.indd 33 10/24/14 2:15 PM


COLUMNS
AT THE FORGE

PostgreSQL, REUVEN M.
LERNER

the NoSQL
Database
Thinking NoSQL? Believe it or not, PostgreSQL might be a
great choice.

One of the most interesting trends objects, turning them into


in the computer world during the past two-dimensional tables in our
few years has been the rapid growth database. The idea that I can
of NoSQL databases. The term may manipulate objects in my database in
be accurate, in that NoSQL databases the same way as I can in my program
don’t use SQL in order to store and is attractive at many levels.
retrieve data, but that’s about where In some ways, this is the holy grail
the commonalities end. NoSQL of databases: we want something
databases range from key-value stores that is rock-solid reliable, scalable to
to columnar databases to document the large proportions that modern
databases to graph databases. Web applications require and also
On the face of it, nothing sounds convenient to us as programmers.
more natural or reasonable than a One popular solution is an ORM
NoSQL database. The “impedance (object-relational mapper), which
mismatch” between programming allows us to write our programs using
languages and databases, as it often objects. The ORM then translates
is described, means that we generally those objects and method calls into
must work in two different languages, the appropriate SQL, which it passes
and in two different paradigms. In our along to the database. ORMs certainly
programs, we think and work with make it more convenient to work with
objects, which we carefully construct. a relational database, at least when
And then we deconstruct those it comes to simple queries. And to no

34 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 34 10/24/14 2:15 PM


COLUMNS
AT THE FORGE

But ORMs have their problems as well, in no


small part because they can shield us from the
inner workings of our database.

small degree, they also improve the more reliable, and even more scalable,
readability of our code, in that we can than many of their NoSQL cousins.
stick with our objects, without having Sure, you might need to work hard
to use a combination of languages in order to get the scaling to work
and paradigms. correctly, but there is no magic
But ORMs have their problems as solution. In the past few months
well, in no small part because they alone, I’ve gained several new clients
can shield us from the inner workings who decided to move from NoSQL
of our database. NoSQL advocates solutions to relational databases, and
say that their databases have solved needed help with the architecture,
these problems, allowing them to stay development or optimization.
within a single language. Actually, The thing is, even the most
this isn’t entirely true. MongoDB die-hard relational database fan will
has its own SQL-like query language, admit there are times when NoSQL
and CouchDB uses JavaScript. But data stores are convenient. With
there are adapters that do similar the growth of JSON in Web APIs, it
ORM-like translations for many NoSQL would be nice to be able to store
databases, allowing developers to the result sets in a storage type that
stay within a single language and understands that format and allows
paradigm when developing. me to search and retrieve from it.
The ultimate question, however, And even though key-value stores,
is whether the benefits of NoSQL such as Redis, are powerful and fast,
databases outweigh their issues. I there are sometimes cases when
have largely come to the conclusion I’d like to have the key-value pairs
that, with the exception of key-value connected to data in other relations
stores, the answer is “no”—that a (tables) in my database.
relational database often is going to If this describes your dilemma, I
be a better solution. And by “better”, have good news for you. As I write
I mean that relational databases are this, PostgreSQL, an amazing database

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 35

LJ247-Nov2014bu.indd 35 10/24/14 2:15 PM


COLUMNS
AT THE FORGE

and open-source project, is set to HStore, which provides a key-value


release version 9.4. This new version, store within the PostgreSQL
like all other PostgreSQL versions, environment. Contrary to what
contains a number of optimizations, I originally thought, this doesn’t
improvements and usability features. mean that PostgreSQL treats a
But two of the most intriguing particular table as a key-value store.
features to me are HStore and JSONB, Rather, HStore is a data type, akin to
features that actually turn PostgreSQL INTEGER , TEXT and XML . Thus, any
into a NoSQL database. column—or set of columns—within
Fine, perhaps I’m exaggerating a a table may be defined to be of type
bit here. PostgreSQL was and always HSTORE . For example:
will be relational and transactional,
and adding these new data types CREATE TABLE People (
hasn’t changed that. But having a id SERIAL,
key-value store within PostgreSQL info HSTORE,
opens many new possibilities for PRIMARY KEY(id)
developers. JSONB, a binary version );
of JSON storage that supports
indexing and a large number of Once I have done that, I can ask
operators, turns PostgreSQL into a PostgreSQL to show me the definition
document database, albeit one with of the table:
a few other features in it besides.
In this article, I introduce these NoSQL \d people

features that are included in PostgreSQL Table "public.people"

9.4, which likely will be released -----------------------------------------------------------------

before this issue of Linux Journal gets | Column | Type | Modifiers |

to you. Although not every application -----------------------------------------------------------------

needs these features, they can be | id | integer | not null default |

useful—and with this latest release of | | | ´ nextval('people_id_seq'::regclass) |

PostgreSQL, the performance also is -----------------------------------------------------------------

significantly improved. | info | hstore | |

-----------------------------------------------------------------

HStore Indexes:

One of the most interesting new "people_pkey" PRIMARY KEY, btree (id)

developments in PostgreSQL is that of

36 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 36 10/24/14 2:15 PM


COLUMNS
AT THE FORGE

As you can see, the type of my UPDATE People SET info = info || 'abc';

“info” column is hstore. What I have ERROR: XX000: Unexpected end of string

effectively created is a (database) LINE 1: UPDATE People SET info = info || 'abc';

table of hash tables. Each row in ^

the “people” table will have its


own hash table, with any keys and PostgreSQL tries to apply the ||
values. It’s typical in such a situation operator to the HStore on the left,
for every row to have the same key but cannot find a key-value pair in
names, or at least some minimum the string on the right, producing an
number of overlapping key names, error message. However, you can add
but you can, of course, use any keys a pair, which will work:
and values you like.
Both the keys and the values in an UPDATE People SET info = info || 'abc=>def';
HStore column are text strings. You
can assign a hash table to an HStore As with all hash tables, HStore
column with the following syntax: is designed for you to use the keys
to retrieve the values. That is, each
INSERT INTO people(info) VALUES ('foo=>1, bar=>abc, baz=>stuff'); key exists only once in each HStore
value, although values may be
Notice that although this example repeated. The only way to retrieve a
inserts three key-value pairs into value is via the key. You do this with
the HStore column, they are stored the following syntax:
together, converted automatically
into an HStore, splitting the pairs SELECT info->'bar' FROM People;
where there is a comma, and each ----------------
pair where there is a => sign. | ?column? | |
So far, you won’t see any ----------------
difference between an HStore | abc | |
and a TEXT column, other than ----------------
(perhaps) the fact that you cannot (1 row)
use text functions and operators
on that column. For example, you Notice several things here. First,
cannot use the || operator, which the name of the column remains
normally concatenates text strings, without any quotes, just as you
on the HStore: do when you’re retrieving the full

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 37

LJ247-Nov2014bu.indd 37 10/24/14 2:15 PM


COLUMNS
AT THE FORGE

The big news in version 9.4 is that GiN and GIST


indexes now support HStore columns, and that
they do so with great efficiency and speed.

contents of the column. Second, you as an integer, you must cast that value:
put the name of the key after the
-> arrow, which is different from the SELECT (info->'foo')::integer * 5 from people;

=> (“hashrocket”) arrow used to ----------------

delineate key-value pairs within the | ?column? | |

HStore. Finally, the returned value ----------------

always will be of type TEXT. This | 5 | |

means if you say: ----------------

(1 row)

SELECT info->'foo' || 'a' FROM People;


---------------- Now, why is HStore so exciting? In
| ?column? | | particular, if you’re a database person
---------------- who values normalization, you might
| 1a | | be wondering why someone even
---------------- would want this sort of data store,
(1 row) rather than a nicely normalized table
or set of tables.
Notice that ||, which works on The answer, of course, is that
text values, has done its job here. there are many different uses for a
However, this also means that if you database, and some of them can be
try to multiply your value, you will more appropriate for an HStore. I
get an error message: never would suggest storing serious
data in such a thing, but perhaps you
SELECT info->'foo' * 5 FROM People; want to keep track of user session
info->'foo' * 5 from people; information, without keeping it inside
^ of a binary object.
Time: 5.041 ms Now, HStore is not new to
PostgreSQL. The big news in version
If you want to retrieve info->'foo' 9.4 is that GiN and GIST indexes now

38 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 38 10/24/14 2:15 PM


COLUMNS
AT THE FORGE

support HStore columns, and that they retrieve particular parts of the data
do so with great efficiency and speed. with relative ease.
Where do I plan to use HStore? To be However, the storage and retrieval
honest, I’m not sure yet. I feel like this of JSON data was never that efficient,
is a data type that I likely will want and the JSON-related operators were
to use at some point, but for now, particularly bad on this front. So yes,
it’s simply an extra useful, efficient you could look for a particular name
tool that I can put in my programming or value within a JSON column, but it
toolbox. The fact that it is now extremely might take a while.
efficient, and its operators can take That has changed with 9.4, with the
advantage of improved indexes, introduction of the JSONB data type,
means that HStore is not only which stores JSON data in binary form,
convenient, but speedy, as well. such that it is both more compact
and more efficient than the textual
JSON and JSONB form. Moreover, the same GIN and
It has long been possible to store JSON GIST indexes that now are able to
inside PostgreSQL. After all, JSON is just work so well with HStore data also are
a textual representation of JavaScript able to work well, and quickly, with
objects (“JavaScript Object Notation”), JSONB data. So you can search for and
which means that they are effectively retrieve text from JSONB documents
strings. But of course, when you store as easily (or more) as would have been
data in PostgreSQL, you would like a the case with a document database,
bit more than that. You want to ensure such as MongoDB.
that stored data is valid, as well as use I already have started to use JSONB
PostgreSQL’s operators to retrieve and in some of my work. For example, one
work on that data. of the projects I’m working on contacts
PostgreSQL has had a JSON data a remote server via an API. The server
type for several years. The data returns its response in JSON, containing
type started as a simple textual a large number of name-value pairs,
representation of JSON, which would some of them nested. (I should note
check for valid contents, but not that using a beta version of PostgreSQL,
much more than that. The 9.3 release or any other infrastructural technology,
of PostgreSQL allowed you to use a is only a good idea if you first get the
larger number of operators on your client’s approval, and explain the risks
JSON columns, making it possible to and benefits.)

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 39

LJ247-Nov2014bu.indd 39 10/24/14 2:15 PM


COLUMNS
AT THE FORGE

Now, I’m a big fan of normalized data. And I’m not


a huge fan of storing JSON in the database.

Now, I’m a big fan of normalized data. then could retrieve data from the
And I’m not a huge fan of storing JSON JSON column:
in the database. But rather than start to
guess what data I will and won’t need in SELECT id, email,

the future, I decided to store everything personal_data->>'surname' AS surname

in a JSONB column for now. If and when personal_data->>'forename' as given_name

I know precisely what I’ll need, I will FROM ID_Checks

normalize the data to a greater degree. WHERE personal_data->>'surname' ilike '%lerner%';

Actually, that’s not entirely true.


I knew from the start that I would Using the double-arrow operator
need two different values from the (->>), I was able to retrieve the value
response I was receiving. But because of a JSON object by using its key. Note
I was storing the data in JSONB, I that if you use a single arrow (->),
figured it would make sense for me you’ll get an object back, which is
simply to retrieve the data from the quite possibly not what you want. I’ve
JSONB column. found that the text portion is really
Having stored the data there, I what interests me most of the time.

Resources
Blog postings about improvements to PostgreSQL’s GiN and GIST indexes, which affect
the JSON and HStore types:

Q http://obartunov.livejournal.com/172503.html

Q http://obartunov.livejournal.com/174887.html

Q http://obartunov.livejournal.com/175235.html

PostgreSQL documentation is at http://postgresql.org/docs, and it includes several


sections for each of HStore and JSONB.

40 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 40 10/24/14 2:15 PM


COLUMNS
AT THE FORGE

Conclusion say that the next time you’re thinking


People use NoSQL databases of using a NoSQL database, consider
for several reasons. One is the using one that can already fulfill all of
impedance mismatch between your needs, and which you might well
objects and tables. But two other be using already—PostgreSQL. Q
common reasons are performance
and convenience. It turns out that Reuven M. Lerner is a Web developer, consultant and trainer.
modern versions of PostgreSQL offer He recently completed his PhD in Learning Sciences from
excellent performance, thanks to Northwestern University. You can read his blog, Twitter feed and
improved data types and indexes. newsletter at http://lerner.co.il. Reuven lives with his wife and
But they also offer a great deal of three children in Modi’in, Israel.
convenience, letting you set, retrieve
and delete JSON and key-value data
easily, efficiently and naturally. Send comments or feedback via
I’m not going to dismiss the entire http://www.linuxjournal.com/contact
NoSQL movement out of hand. But I will or to ljeditor@linuxjournal.com.

LJ247-Nov2014bu.indd 41 10/24/14 2:15 PM


COLUMNS
WORK THE SHELL

Mad Libs DAVE TAYLOR

for Dreams,
Part II
Dream Interpreter—Dave mucks about with some free
association and word substitution to create a dream
interpretation script as suggested by a reader. Along the
way, he also re-examines the problem of calculating leap
years and shows off a handy text formatting trick too.

I’m in the middle of writing what I’ll that were presented in FORTRAN and
call a Mad Libs for dream interpretation other classic scientific programming
script, as detailed in my article in the languages. Yes, FORTRAN.
October 2014 issue, but before I get The simplest solution proved to be
back to it, I have to say that more letting Linux itself do the heavy lifting
people have written to me about the and just check to see how many days
leap year function presented many were in a given calendar year by using
months ago than any other topic in the GNU date for a given year:
history of this column.
I never realized people were so date -d 12/31/YEAR +%j
passionate about their leap years—and
to consider that it’s to compensate for If it’s 366, it’s a leap year. If it’s 365,
the fact that our 365-day calendar is it isn’t—easy.
shorter than a solar year by almost six But the winner is reader Norbert
hours per year, starting way back in :ACHARIAS WHO SENT IN THIS LINK
1592, an extra day was added every http://aa.usno.navy.mil/faq/docs/
four years (approximately). JD_Formula.php. You can go there
The variety of solutions sent in and enjoy the delightful complexity of
were quite impressive, including some this US Navy solution!

42 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 42 10/24/14 2:15 PM


COLUMNS
WORK THE SHELL

Even better, the noun A free association


phrase mapping is a one-way translation,
so we don't even really need to save it.

Now, back to dreams—a perfect segue! for word in $nouns

In my last article, I started working do

on a reader-suggested script that echo "What comes to mind when I say $word?"

would let people type in a few done

sentences describing a dream, then


extract all the nouns and prompt the To expand it as needed is easy:
user for a free association synonym
(or, I suppose, antonym), then echo echo "What comes to mind when I say $word?"
back the original description with read newword
all the substitutions. sedstring="$sedstring;s/$word/$newword/g"
With the addition of a noun list and
a simple technique for deconstructing That’s it. Let’s put that in place and
what has been given to identify the see what happens when we create a
nouns, most of the code actually is half-dozen noun substitutions. I’ll skip
written. Even better, the noun A free some of the I/O and just tell you that
association phrase mapping is a the phrase I entered was “The rain in
one-way translation, so we don’t spain falls mainly on the plain” and
even really need to save it. This that the script then dutifully identified
means that a sed sequence like: “rain”, “spain” and “plain” as nouns.
The result:
s/old/new/g
What comes to mind when I say rain?

will work just fine, and because storm

that can be appended to multiple What comes to mind when I say spain?

substitutions, it just might prove soccer

super easy. What comes to mind when I say plain?

Here’s the code stub that prompts jane

users for a new word for each existing build sed string

noun they’ve entered: ´;s/rain/storm/g;s/spain/soccer/g;s/plain/jane/g

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 43

LJ247-Nov2014bu.indd 43 10/24/14 2:15 PM


COLUMNS
WORK THE SHELL

Great. We’re close to being done nounlist="nounlist.txt"

with the script—really close. In fact, dream="/tmp/dreamer.$$"

all that’s left is:


input=""; nouns=""

cat $dream | sed $sedstring


trap "/bin/rm -f $dream" 0 # no tempfile left behind

Let’s try it:


echo "Welcome to Dreamer. To start, please describe in a

$ dreamer.sh ´few sentences"

Welcome to Dreamer. To start, please describe in a few sentences echo "the dream you'd like to explore. End with "DONE"

the dream you'd like to explore. End with DONE in all caps on its ´in all caps on "

own line. The rain in Spain falls mainly on the plain. echo "its own line."

DONE

Hmm.... okay. I have identified the following words as nouns: until [ "$input" = "DONE" -o "$input" = "done" ]

rain spain plain do

Are you ready to do some free association? Let's begin... echo "$input" >> $dream

What comes to mind when I say rain? read input # let's read another line from the user...

storm done

What comes to mind when I say spain?

soccer for word in $( sed 's/[[:punct:]]//g' $dream | tr '[A-Z]'

What comes to mind when I say plain? ´'[a-z]' | tr ' ' '\n')

jane do

The result: # is the word a noun? Let's look!

The storm in Spain falls mainly on the jane. if [ ! -z "$(grep -E "^${word}$" $nounlist)" ] ; then

nouns="$nouns $word"

By George, I think we have it! fi

Here’s the final code: done

#!/bin/sh echo "Hmm.... okay. I have identified the following words as nouns:"

echo "$nouns"

# dreamer - script to help interpret dreams. does this by

# asking users to describe their most recent dream, echo "Are you ready to do some free association? Let's begin..."

# then prompts them to free associate words

# for each of the nouns in their original description. for word in $nouns

do

44 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 44 10/24/14 2:15 PM


COLUMNS
WORK THE SHELL

To be fair, this is a bit of an odd script to write,


but the basic concept of breaking input down
into individual words, processing those words
and reassembling the output is something that
does have wider applicability.

echo "What comes to mind when I say $word?" The first two sections of this pipe
read newword do the word substitution. No rocket
sedstring="$sedstring;s/$word/$newword/g" science there (well, unless your rocket
done happens to run Bourne Shell, but
that’s a somewhat anxiety-provoking
echo "The result:" concept). What’s interesting are the
cat $dream | sed "$sedstring" | fmt | sed 's/^/ /' last two elements.
echo "" The fmt command wraps overly long
exit 0 or short lines to make them all fill in to
be around 80 characters long, and then
To be fair, this is a bit of an odd the final sed statement prefaces every
script to write, but the basic concept line with a double space. I actually use
of breaking input down into individual this frequently because I like my scripts
words, processing those words and to be able to output arbitrary length
reassembling the output is something text that’s nice and neat.
that does have wider applicability. Let’s grab that great journal from
For example, you might use common Ishmael and use it as an example:
acronyms but need to have them
spelled out for a final report, or $ cat moby.txt

language substitution or redacting Call me Ishmael.

specific names. Some years ago - never mind how long precisely - having little or no

There’s also another trick worth money in my purse, and nothing particular to interest me on shore, I

noting on the last output line. Let’s thought I would sail about a little and see the watery part

look at the statement again: of the world.

It is a way I have of driving off the spleen and regulating the

cat $dream | sed "$sedstring" | fmt | sed 's/^/ /' circulation. Whenever I find myself growing grim about the mouth;

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 45

LJ247-Nov2014bu.indd 45 10/24/14 2:15 PM


COLUMNS
WORK THE SHELL

whenever it is a damp, drizzly November in my soul; whenever I find precisely - having little or no money in my purse, and

myself involuntarily pausing nothing particular to interest me on shore, I thought I

before coffin would sail about a little and see the watery part of the

warehouses, and bringing up the rear world. It is a way I have of driving off the spleen and

of every funeral I meet; and especially whenever my hypos get such an regulating the circulation. Whenever I find myself growing

upper hand of me, that it requires a strong moral principle to prevent grim about the mouth; whenever it is a damp, drizzly November

me from deliberately stepping into the street, and methodically in my soul; whenever I find myself involuntarily pausing

knocking people's hats off - then, I account it high time to get to before coffin warehouses, and bringing up the rear of every

sea as soon as I can. funeral I meet; and especially whenever my hypos get such an

upper hand of me, that it requires a strong moral principle to

Run that output through the fmt prevent me from deliberately stepping into the street, and

command, however, and it all cleans methodically knocking people's hats off - then, I account it

up perfectly: high time to get to sea as soon as I can.

$ cat moby.txt | fmt

Call me Ishmael. Some years ago - never mind how long See how that works? You also
precisely - having little or no money in my purse, and nothing can preface each line with “>”
particular to interest me on shore, I thought I would sail or any other sequence you’d like.
about a little and see the watery part of the world. It is Easy enough!
a way I have of driving off the spleen and regulating the Well, that’s it for this month. Next
circulation. Whenever I find myself growing grim about the month, we’ll dig into, um, I don’t
mouth; whenever it is a damp, drizzly November in my soul; know. What should we explore next
whenever I find myself involuntarily pausing before coffin month, dear reader? Q
warehouses, and bringing up the rear of every funeral I meet;

and especially whenever my hypos get such an upper hand of me, Dave Taylor has been hacking shell scripts for more than 30
that it requires a strong moral principle to prevent me from years—really. He’s the author of the popular Wicked Cool
deliberately stepping into the street, and methodically knocking Shell Scripts (and just completed a 10th anniversary revision to
people's hats off - then, I account it high time to get to sea the book, coming very soon from O’Reilly and NoStarch Press).
as soon as I can. You can find him on Twitter as @DaveTaylor and more generally
at his tech site http://www.AskDaveTaylor.com.
Now let’s indent each line by those
two spaces:
Send comments or feedback via
$ cat moby.txt | fmt | sed 's/^/ /' http://www.linuxjournal.com/contact
Call me Ishmael. Some years ago - never mind how long or to ljeditor@linuxjournal.com.

46 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 46 10/24/14 2:15 PM


LINUX JOURNAL
on your
Android device
Download the
app now on the
Google Play
Store

www.linuxjournal.com/android
For more information about advertising opportunities within Linux Journal iPhone, iPad and
Android apps, contact John Grogan at +1-713-344-1956 x2 or ads@linuxjournal.com.

LJ247-Nov2014bu.indd 47 10/24/14 2:15 PM


COLUMNS
HACK AND /

Localhost KYLE RANKIN

DNS Cache
This month, Kyle covers one of his favorite topics—no, it’s not
mutt—it’s DNS.

Is it weird to say that DNS is my cache that does nothing more than
favorite protocol? Because DNS is my forward DNS requests to your normal
favorite protocol. There’s something resolvers and honor the TTL of the
about the simplicity of UDP packets records it gets back.
combined with the power of a service There are a number of different
that the entire Internet relies on that ways to implement DNS caching. In
grabs my interest. Through the years, the past, I’ve used systems like nscd
I’ve been impressed with just how few that intercept DNS queries before
resources you need to run a modest DNS they would go to name servers in
infrastructure for an internal network. /etc/resolv.conf and see if they already
Recently, as one of my environments are present in the cache. Although
started to grow, I noticed that even it works, I always found nscd more
though the DNS servers were keeping difficult to troubleshoot than DNS
up with the load, the query logs were when something went wrong. What
full of queries for the same hosts I really wanted was just a local DNS
over and over within seconds of each server that honored TTL but would
other. You see, often a default Linux forward all requests to my real
installation does not come with any name servers. That way, I would
sort of local DNS caching. That means get the speed and load benefits of
that every time a hostname needs to a local cache, while also being able
be resolved to an IP, the external DNS to troubleshoot any errors with
server is hit no matter what TTL you standard DNS tools.
set for that record. The solution I found was dnsmasq.
This article explains how simple it Normally I am not a big advocate for
is to set up a lightweight local DNS dnsmasq, because it’s often touted

48 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 48 10/24/14 2:15 PM


COLUMNS
HACK AND /

As a heavy user of configuration management


systems, I prefer the servicename.d configuration
model, as it makes it easy to push different
configurations for different uses.

as an easy-to-configure full DNS and /etc/default/dnsmasq. The file is fully


DHCP server solution, and I prefer commented, so I won’t paste it here.
going with standalone services Instead, I list two variables I made
for that. Dnsmasq often will be sure to set:
configured to read /etc/resolv.conf
for a list of upstream name servers to ENABLED=1
forward to and use /etc/hosts for zone IGNORE_RESOLVCONF=yes
configuration. I wanted something
completely different. I had full-featured The first variable makes sure the
DNS servers already in place, and if service starts, and the second will tell
I liked relying on /etc/hosts instead dnsmasq to ignore any input from
of DNS for hostname resolution, I’d the resolvconf service (if it’s installed)
hop in my DeLorean and go back to when determining what name servers
the early 1980s. Instead, the bulk to use. I will be specifying those
of my dnsmasq configuration will manually anyway.
be focused on disabling a lot of the The next step is to configure
default features. dnsmasq itself. The default
The first step is to install dnsmasq. configuration file can be found at
This software is widely available for /etc/dnsmasq.conf, and you can
most distributions, so just use your edit it directly if you want, but in
standard package manager to install my case, Debian automatically sets
the dnsmasq package. In my case, up an /etc/dnsmasq.d directory and
I’m installing this on Debian, so there will load the configuration from any
are a few Debianisms to deal with file you find in there. As a heavy
that you might not have to consider user of configuration management
if you use a different distribution. systems, I prefer the servicename.d
First is the fact that there are some configuration model, as it makes it
rather important settings placed in easy to push different configurations

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 49

LJ247-Nov2014bu.indd 49 10/24/14 2:15 PM


COLUMNS
HACK AND /

for different uses. If your distribution two settings, listen-address and


doesn’t set up this directory for you, bind-interfaces ensure that dnsmasq
you can just edit /etc/dnsmasq.conf binds to and listens on only the
directly or look into adding an option localhost interface (127.0.0.1). You
like this to dnsmasq.conf: don’t want to risk outsiders using your
service as an open DNS relay.
conf-dir=/etc/dnsmasq.d The server configuration lines are
where you add the upstream name
In my case, I created a new file servers you want dnsmasq to use.
called /etc/dnsmasq.d/dnscache.conf In my case, I added three different
with the following settings: upstream name servers in my
preferred order. The syntax for this
no-hosts line is server=/domain_to_use/
no-resolv nameserver_ip . So in the above
listen-address=127.0.0.1 example, it would use those name
bind-interfaces servers for dev.example.com
server=/dev.example.com/10.0.0.5 resolution. In my case, I also wanted
server=/10.in-addr.arpa/10.0.0.5 dnsmasq to use those name servers
server=/dev.example.com/10.0.0.6 for IP-to-name resolution (PTR
server=/10.in-addr.arpa/10.0.0.6 records), so since all the internal IPs
server=/dev.example.com/10.0.0.7 are in the 10.x.x.x network, I added
server=/10.in-addr.arpa/10.0.0.7 10.in-addr.arpa as the domain.
Once this configuration file is
Let’s go over each setting. The first, in place, restart dnsmasq so the
no-hosts, tells dnsmasq to ignore settings take effect. Then you can
/etc/hosts and not use it as a source use dig pointed to localhost to test
of DNS records. You want dnsmasq to whether dnsmasq works:
use your upstream name servers only.
The no-resolv setting tells dnsmasq $ dig ns1.dev.example.com @localhost

not to use /etc/resolv.conf for the


list of name servers to use. This is ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> ns1.dev.example.com @localhost

important, as later on, you will add ;; global options: +cmd

dnsmasq’s own IP to the top of ;; Got answer:

/etc/resolv.conf, and you don’t want ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4208

it to end up in some loop. The next ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

50 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 50 10/24/14 2:15 PM


COLUMNS
HACK AND /

safety in case dnsmasq ever were to


;; QUESTION SECTION: crash. If you use DHCP to get an IP or
;ns1.dev.example.com. IN A otherwise have these values set from a
different file (such as is the case when
;; ANSWER SECTION: resolvconf is installed), you’ll need
ns1.dev.example.com. 265 IN A 10.0.0.5 to track down what files to modify
instead; otherwise, the next time you
;; Query time: 0 msec get a DHCP lease, it will overwrite this
;; SERVER: 127.0.0.1#53(127.0.0.1) with your new settings.
;; WHEN: Thu Sep 18 00:59:18 2014 I deployed this simple change to
;; MSG SIZE rcvd: 56 around 100 servers in a particular
environment, and it was amazing
Here, I tested ns1.dev.example.com to see the dramatic drop in DNS
and saw that it correctly resolved traffic, load and log entries on my
to 10.0.0.5. If you inspect the internal name servers. What’s more,
dig output, you can see near the with this in place, the environment
bottom of the output that SERVER: is even more tolerant in the case
127.0.0.1#53(127.0.0.1) confirms there ever were a real problem with
that I was indeed talking to 127.0.0.1 downstream DNS servers—existing
to get my answer. If you run this cached entries still would resolve for
command again shortly afterward, the host until TTL expired. So if you
you should notice that the TTL setting find your internal name servers are
in the output (in the above example getting hammered with traffic, an
it was set to 265) will decrement. internal DNS cache is something you
Dnsmasq is caching the response, and definitely should consider. Q
once the TTL gets to 0, dnsmasq will
query a remote name server again. Kyle Rankin is a Sr. Systems Administrator in the San Francisco
After you have validated that Bay Area and the author of a number of books, including The
dnsmasq functions, the final step is Official Ubuntu Server Book, Knoppix Hacks and Ubuntu Hacks.
to edit /etc/resolv.conf and make sure He is currently the president of the North Bay Linux Users’ Group.
that you have nameserver 127.0.0.1
listed above all other nameserver
lines. Note that you can leave all of Send comments or feedback via
the existing name servers in place. http://www.linuxjournal.com/contact
In fact, that provides a means of or to ljeditor@linuxjournal.com.

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 51

LJ247-Nov2014bu.indd 51 10/24/14 2:15 PM


COLUMNS
THE OPEN-SOURCE CLASSROOM

DevOps: SHAWN POWERS

Better Than the


Sum of Its Parts
Chef, a garden rake for the DevOps farm.

Most of us longtime system (Those numbers work in binary too,


administrators get a little nervous although I suggest a larger sample
when people start talking about size.) The problem is that many
DevOps. It’s an IT topic surrounded by folks confuse DevOps with DevOps
a lot of mystery and confusion, much tools. These days, when people ask
like the term “Cloud Computing” me, “What is DevOps?”, I generally
was a few years back. Thankfully, respond: “DevOps isn’t a thing, it’s a
DevOps isn’t something sysadmins way of doing a thing.”
need to fear. It’s not software that The worlds of system administration
allows developers to do the job of and development historically have
the traditional system administrator, been very separate. As a sysadmin, I
but rather it’s just a concept making tend to think very differently about
both development and system computing from how a developer
administration better. Tools like does. For me, things like scalability
Chef and Puppet (and Salt Stack, and redundancy are critical, and my
Ansible, New Relic and so on) aren’t success often is gauged by uptime.
“DevOps”, they’re just tools that If things are running, I’m successful.
allow IT professionals to adopt a Developers have a different way of
DevOps mindset. Let’s start there. approaching their jobs, and need
to consider things like efficiency,
What Is DevOps? stability, security and features. Their
Ask ten people to define DevOps, and success often is measured by usability.
you’ll likely get 11 different answers. Hopefully, you’re thinking the

52 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 52 10/24/14 2:15 PM


COLUMNS
THE OPEN-SOURCE CLASSROOM

traits I listed are important for farmer, wedging DevOps tools into
both development and system your organization doesn’t create a
administration. In fact, it’s that DevOps team for you. That said, just
mindset from which DevOps was like any farmer appreciates a good
born. If we took the best practices rake, any DevOps team will benefit
from the world of development, and from using the plethora of tools in
infused them into the processes of the DevOps world.
operations, it would make system
administration more efficient, more The System Administrator’s
reliable and ultimately better. The New Rake
same is true for developers. If they In this article, I want to talk about
can begin to “code” their own using DevOps tools as a system
hardware as part of the development administrator. If you’re a sysadmin
process, they can produce and who isn’t using a configuration
deploy code more quickly and more management tool to keep track of
efficiently. It’s basically the Reese’s your servers, I urge you to check
Peanut Butter Cup of IT. Combining one out. I’m going to talk about
the strengths of both departments Chef, because for my day job, I
creates a result that is better than recently taught a course on how to
the sum of its parts. use it. Since you’re basically learning
Once you understand what DevOps the concepts behind DevOps tools,
really is, it’s easy to see how people it doesn’t matter that you’re
confuse the tools (Chef, Puppet, New focusing on Chef. Kyle Rankin is a
Relic and so on) for DevOps itself. big fan of Puppet, and conceptually,
Those tools make it so easy for people it’s just another type of rake. If you
to adopt the DevOps mindset, that have a favorite application that isn’t
they become almost synonymous Chef, awesome.
with the concept itself. But don’t be If I’m completely honest, I have
seduced by the toys—an organization to admit I was hesitant to learn
can shift to a very successful DevOps Chef, because it sounded scary
way of doing things simply by and didn’t seem to do anything
focusing on communication and I wasn’t already doing with Bash
cross-discipline learning. The tools scripts and cron jobs. Plus, Chef uses
make it easier, but just like owning the Ruby programming language
a rake doesn’t make someone a for its configuration files, and my

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 53

LJ247-Nov2014bu.indd 53 10/24/14 2:15 PM


COLUMNS
THE OPEN-SOURCE CLASSROOM

programming skills peaked with: infrastructures. You can have a


development environment that is
10 PRINT "Hello!" completely separate from production,
20 GOTO 10 and have the distinction made
completely by the version numbers
Nevertheless, I had to learn about of your configuration files. You can
it so I could teach the class. I can have your configurations function
tell you with confidence, it was completely platform agnostically,
worth it. Chef requires basically zero so a recipe to spin up an Apache
programming knowledge. In fact, if no server will work whether you’re
one mentioned that its configuration using CentOS, Ubuntu, Windows
files were Ruby, I’d just have assumed or OS X. Basically, Chef can be the
the syntax for the conf files was central resource for organizing
specific and unique. Weird config files your entire infrastructure, including
are nothing new, and honestly, Chef’s hardware, software, networking and
config files are easy to figure out. even user management.
Thankfully, it doesn’t have to do
Chef: Its Endless Potential all that. If using Chef meant turning
DevOps is a powerful concept, and your entire organization on its head,
as such, Chef can do amazing things. no one would ever adopt it. Chef can
Truly. Using creative “recipes”, it’s be installed small, and if you desire,
possible to spin up hundreds of it can grow to handle more and more
servers in the cloud, deploy apps, in your company. To continue with my
automatically scale based on need and farmer analogy, Chef can be a simple
treat every aspect of computing as garden rake, or it can be a giant diesel
if it were just a function to call from combine tractor. And sometimes, you
simple code. You can run Chef on a just need a garden rake. That’s what
local server. You can use the cloud- you’re going to learn today. A simple
based service from the Chef company introduction to the Chef way of doing
instead of hosting a server. You even things, allowing you to build or not
can use Chef completely server- build onto it later.
less, deploying the code on a single
computer in solo mode. The Bits and Pieces
Once it’s set up, Chef supports Initially, this was going to be a
multiple environments of similar multipart article on the specifics

54 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 54 10/24/14 2:15 PM


COLUMNS
THE OPEN-SOURCE CLASSROOM

Figure 1. This is the basic Chef setup, showing how data flows.

of setting up Chef for your At its heart, Chef functions


environment. I still might do a as a central repository for all
series like that for Chef or another your configuration files. Those
DevOps configuration automation configuration files also include the
package, but here I want everyone to ability to carry out functions on
understand not only DevOps itself, servers. If you’re a sysadmin, think
but what the DevOps tools do. And of it as a central, dynamic /etc
again, my example will be Chef. directory along with a place all your

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 55

LJ247-Nov2014bu.indd 55 10/24/14 2:15 PM


COLUMNS
THE OPEN-SOURCE CLASSROOM

Bash and Perl scripts are held. See to specify what type of system it’s
Figure 1 for a visual on how Chef’s installing on. If you’ve ever been
information flows. frustrated by Red Hat variants calling
The Admin Workstation is the Apache “httpd”, and Debian variants
computer at which configuration files calling it “apache2”, you’ll love Chef.
and scripts are created. In the world Once you have created the
of Chef, those are called cookbooks cookbooks and recipes you need to
and recipes, but basically, it’s the configure your servers, you upload
place all the human-work is done. them to the Chef server. You can
Generally, the local Chef files are connect to the Chef server via its
kept in a revision control system like Web interface, but very little actual
Git, so that configurations can be work is done via the Web interface.
rolled back in the case of a failure. Most of the configuration is done
This was my first clue that DevOps on the command line of the Admin
might make things better for system Workstation. Honestly, that is
administrators, because in the past something a little confusing about
all my configuration revision control Chef that gets a little better with
was done by making a copy of a every update. Some things can
configuration file before editing it, be modified via the Web page
and tacking a .date at the end of interface, but many things can’t.
the filename. Compared to the code A few things can only be modified
revision tools in the developer’s on the Web page, but it’s not always
world, that method (or at least my clear which or why.
method) is crude at best. With the code, configs and
The cookbooks and recipes created files uploaded to the Chef Server,
on the administrator workstation the attention is turned to the
describe things like what files nodes. Before a node is part of
should be installed on the server the Chef environment, it must be
nodes, what configurations should “bootstrapped”. The process isn’t
look like, what applications should difficult, but it is required in order
be installed and stuff like that. to use Chef. The client software is
Chef does an amazing job of being installed on each new node, and then
platform-neutral, so if your cookbook configuration files and commands are
installs Apache, it generally can pulled from the Chef server. In fact,
install Apache without you needing in order for Chef to function, the

56 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 56 10/24/14 2:15 PM


COLUMNS
THE OPEN-SOURCE CLASSROOM

nodes must be configured to poll the their server, so you need to worry
server periodically for any changes. only about your nodes. They even
There is no “push” methodology allow you to connect five of your
to send changes or updates to the server nodes for free. If you have a
node, so regular client updates are small environment, or if you don’t
important. (These are generally have the resources to host your own
performed via cron.) Chef Server, it’s tempting just to use
At this point, it might seem a little their pre-configured cloud service.
silly to have all those extra steps Be warned, however, that it’s free
when a simple FOR loop with some only because they hope you’ll
SSH commands could accomplish the start to depend on the service and
same tasks from the workstation, eventually pay for connecting more
and have the advantage of no Chef than those initial five free nodes.
client installation or periodic polling. They have an enterprise-based self-
And I confess, that was my thought hosted solution that moves the Chef
at first too. When programs like Chef Server into your environment like
really prove their worth, however, is Figure 1 shows. But it’s important to
when the number of nodes begins realize that Chef is open source, so
to scale up. Once the admittedly there is a completely free, and fully
complex setup is created, spinning functional open-source version of the
up a new server is literally a single server you can download and install
one-liner to bootstrap a node. Using into your environment as well. You do
something like Amazon Web Services, lose their support, but if you’re just
or Vagrant, even the creation of the starting out with Chef or just playing
computers themselves can be part of with it, having the open-source
the Chef process. version is a smart way to go.

To Host or Not to Host How to Begin?


The folks at Chef have made the The best news about Chef is that
process of getting a Chef Server incredible resources exist for learning how
instance as simple as signing up to use it. On the http://getchef.com
for a free account on their cloud Web site, there is a video series
infrastructure. They maintain a outlining a basic setup for installing
“Chef Server” that allows you to Apache on your server nodes as
upload all your code and configs to an example of the process. Plus,

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 57

LJ247-Nov2014bu.indd 57 10/24/14 2:15 PM


COLUMNS
THE OPEN-SOURCE CLASSROOM

there’s great documentation that to keep their code organized and


describes the installation process of efficient, while at the same time we
the open-source Chef Server, if that’s can hand off some of the tasks we
the path you want to try. hate (spinning up test servers for
Once you’re familiar with how example) to the developers, so they
Chef works (really, go through the can do their jobs better, and we can
training videos, or find other Chef focus on more important sysadmin
fundamentals training somewhere), things. Tearing down that wall
the next step is to check out the between development and operations
vibrant Chef community. There truly makes everyone’s job easier,
are cookbooks and recipes for but it requires communication, trust
just about any situation you can and a few good rakes in order to be
imagine. The cookbooks are just successful. Check out a tool like Chef,
open-source code and configuration and see if DevOps can make your job
files, so you can tweak them to fit easier and more awesome. Q
your particular needs, but like any
downloaded code, it’s nice to start Shawn Powers is the Associate Editor for Linux Journal.
with something and tweak it instead He’s also the Gadget Guy for LinuxJournal.com, and he has an
of starting from scratch. interesting collection of vintage Garfield coffee mugs. Don’t let
DevOps is not a scary new his silly hairdo fool you, he’s a pretty ordinary guy and can be
trend invented by developers in reached via e-mail at shawn@linuxjournal.com. Or, swing by
order to get rid of pesky system the #linuxjournal IRC channel on Freenode.net.
administrators. We’re not being
replaced by code, and our skills aren’t
becoming useless. What a DevOps Send comments or feedback via
mindset means is that we get to steal http://www.linuxjournal.com/contact
the awesome tools developers use or to ljeditor@linuxjournal.com.

Resources

Chef Fundamentals Video Series: https://learn.getchef.com/fundamentals-series

Chef Documentation: https://docs.getchef.com

Community Cookbooks/Tools: https://supermarket.getchef.com

58 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 58 10/24/14 2:15 PM


DEDICATED SERVERS. BY GEEKS FOR GEEKS.

Linux Journal Magazine Exclusive Offer*

15 OFF %

Call 1.888.840.9091 | serverbeach.com


Sign up for any dedicated server at ServerBeach and get 15% off*. Use the promo code: LJ15OFF when ordering.
* Offer expires December 31st, 2010.

Terms and conditions:


© 2010 ServerBeach, a PEER 1 Company. Not responsible for errors or omissions in typography or photography. This is a limited time offer and is subject to change without notice.
Call for details.

LJ247-Nov2014bu.indd 59 10/28/14 2:56 PM


NEW PRODUCTS

Wibu-Systems’ CodeMeter
Embedded Driver
Embedded systems developers seeking to protect their IPs
are the target customers for Wibu-Systems’ CodeMeter
Embedded Driver, a comprehensive security solution that
secures embedded software against reverse-engineering by encrypting and signing the binary
code. CodeMeter protects embedded systems, programmable logic controllers and industrial
PCs. The new CodeMeter Embedded Driver 1.7—a rebranded version of a product called
CodeMeter Compact Driver 1.6—offers new features and functionality that are applicable
specifically to embedded systems. New features include an option to use the HID mode on
dongles for communication with the device without displaying drive status, protection of
the secure boot process, support for the file I/O interface for Linux and Android, and support
for the Secure Disk standard for reading and writing API-based data without enumeration
by the operating system. The driver is available for VxWorks 7.0, Linux Embedded, Windows
Embedded, Android and QNX, as well as for different ARM, x86 and PowerPC platforms.
http://www.wibu.com/us

Linutop OS
Intruders beware, because the new Linutop OS
14.04 is here—the easiest way to set up an ultra-
secure PC, says its maker Linutop. Linutop OS 14.04
is a customized version of Ubuntu 14.04 LTS that
comes loaded with the light XFCE classic graphic
environment, as well as an array of ready-to-use
Linux applications, such as Firefox 28, LibreOffice 4, VLC 2 and Linutop Kiosk. Version
14.04 offers three core enhancements, namely a Linutop Kiosk for a secured Internet
access point, Digital Signage functionality for display of any media type and enhanced
security and privacy. Linutop’s system can be locked in read-only mode, preventing
alterations by viruses or other mishaps. Linutop requires only a minimal HD space
(850MB) and requires minimal processing power: PIII 800MHz and 512MB of RAM.
Linutop OS can be installed quickly on a hard drive, USB key or Flash memory.
http://www.linutop.com

60 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 60 10/24/14 2:15 PM


NEW PRODUCTS

Logic Supply’s
ML400 Series
Industrial PCs
For Logic Supply, the new
ML400 Series of industrial PCs
is more than just the next step
in the evolution of its product line. Rather, says Logic, it’s a distinct break from
the “black box” paradigm that has ruled the industrial hardware market. Logic
Supply’s new ML400 Series is a line of high-performance, boldly styled, rugged
Mini-ITX systems for commercial applications where reliability is paramount. These
fanless, ventless PCs are the company’s smallest to date and are engineered for use
in harsh environments. The models available at launch for the ML400 series offer
a versatile range of I/O and Intel processing capabilities, advanced EMI protection
and next-generation storage in order to maintain an ultra-compact footprint.
http://www.logicsupply.com

Silicon Mechanics, Inc.’s


Rack-Mount Servers with
Intel Xeon E5-2600 v3
Hardware-maker Silicon Mechanics, Inc., is leveraging the latest Intel Xeon processor
E5-2600 v3 product family to create a line of new servers that “will thrill customers
looking to save on operating expenses”. Thanks in large part to the new processor
features—more cores, more cache, faster memory and an updated chipset—the
Silicon Mechanics rack-mount servers feature a well rounded balance of cost,
performance and energy use. These five of the company’s most popular models
sport efficient DDR4 memory, processors with new power-management features
and extensive performance improvements. Finally, the new servers offer customers
a great deal of flexibility regarding memory, storage and power management,
making it easy to find a configuration with the ideal features for nearly any
application and budget, says the company.
http://www.siliconmechanics.com

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 61

LJ247-Nov2014bu.indd 61 10/24/14 2:15 PM


NEW PRODUCTS

Red Hat Software Collections


In order to keep up with developers’ needs while maintaining production stability, Red
Hat keeps the Red Hat Software Collections’ release schedule at a more frequent release
schedule than RHEL. The Collections, recently upgraded to v1.2, is a package of essential
Web development tools, dynamic languages, open-source databases, C and C++ compilers,
the Eclipse IDE, and a variety of development and performance management tools.
These updated components can be installed alongside versions included in base Red Hat
Enterprise Linux. Highlights of the upgrade are the Red Hat Developer Toolset 3.0, included
in the Collections for the first time and also bringing the Eclipse IDE to RHEL 7 for the first
time; DevAssistant 0.9.1, a tool for setting up development environments and publishing
code; Maven 3.0, a build automation tool for Java projects; Git 1.9.4, which previously was
only part of the Red Hat Developer Toolset; Nginx 1.6 Web server and Web proxy; and the
latest stable versions of popular dynamic languages and open-source databases. Red Hat
Software Collections 1.2 is available to eligible users of Red Hat Enterprise Linux 6 and 7.
http://www.redhat.com

Sven Vermeulen’s SELinux


Cookbook (Packt Publishing)
If you are a Linux system or service administrator and want to
(wisely) burnish your SELinux skills, then Packt Publishing and
tech author Sven Vermeulen have a book for you. It’s called
SELinux Cookbook, and it carries a breathless subtitle that sums
it up better than any bumbling Linux journalist could: “Over 100
hands-on recipes to develop fully functional policies to confine
your applications and users using SELinux”. These policies can be custom to users’ own
needs, and users can build readable policy rules from them. Readers can learn further about
the wide range of security controls that SELinux offers by customizing Web application
confinement. Finally, readers will understand how some applications interact with the
SELinux subsystem internally, ensuring that they can confront any challenge they face.
Author Sven Vermeulen is the project lead of Gentoo Hardened’s SELinux integration project
and wrote Packt’s SELinux System Administration book as well.
http://www.packtpub.com

62 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 62 10/24/14 2:15 PM


NEW PRODUCTS

Proxmox Server Solutions GmbH’s


Proxmox Virtual Environment (VE)
Proxmox Virtual Environment (VE) is a Debian GNU/Linux-
based open-source virtualization management solution for
servers. Proxmox VE supports KVM-based guests, container-
VIRTUALIZATION WITH /PEN6: AND INCLUDES STRONG HIGH AVAILABILITY SUPPORT BASED ON 2ED (AT
Cluster and Corosync. Maker Proxmox Server Solutions recently announced a security-focused
version 3.3, whose key additions include an HTML5 console, Proxmox VE Firewall, two-factor
AUTHENTICATION A :&3 STORAGE PLUGIN AND 0ROXMOX 6% -OBILE 0ROXMOX IS PROUDEST OF THE
distributed Proxmox VE Firewall, which is designed to protect the whole IT infrastructure. It
allows users to set up firewall rules for all hosts, the cluster, virtual machines and containers.
The company notes that Proxmox VE is used by 62,000 hosts in 140 countries, its GUI is
available in 17 languages, and the active community counts more than 24,000 forum members.
http://www.proxmox.com

Opera Software ASA’s


Opera TV Ads SDK
Over time, Opera has become much more than a
browser maker. Opera’s latest development is part
of the company’s somewhat new niche in the media
convergence space: Opera TV Ads SDK. The new solution
is targeted at app publishers, Smart TV device manufacturers and pay-TV operators seeking
to better monetize their content by serving video advertising on any platform. Opera TV Ads
SDK previously was available exclusively to apps distributed via the Opera TV Store application
platform and developed through the Opera TV Snap technology. With this new release, the
solution is available as a standalone feature for any HTML5 app or Smart TV device, whether on
the Opera TV Store or other application platforms. Opera says that Opera TV Ads SDK offers a
one-stop solution for placement of video advertising anywhere inside the device user interface,
including targeting users across apps and interactive advertising via linear broadcast.
http://www.opera.com/tv

Please send information about releases of Linux-related products to newproducts@linuxjournal.com or


New Products c/o Linux Journal, PO Box 980985, Houston, TX 77098. Submissions are edited for length and content.

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 63

LJ247-Nov2014bu.indd 63 10/24/14 2:15 PM


FEATURE Ideal Backups with zbackup

Ideal
Backups
with
zbackup
Do you need to back up large volumes of
data spread over many machines with
“Time Machine”-like snapshots? Read on!
DAVID BARTON

64 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 64 10/24/14 2:15 PM


D
ata is growing both in volume The solution combines zbackup,
and importance. As time goes rsync and LVM snapshots. zbackup
on, the amount of data that works by deduplicating a stream—for
we need to store is growing, and example, a tar or database backup—
the data itself is becoming more and and storing the blocks into a storage
more critical for organizations. It is pool. If the same block ever is
becoming increasingly important to encountered again, the previous one
be able to back up and restore this is reused.
information quickly and reliably. Using Combining these three elements
cloud-based systems spreads out the gives us a solution that provides:
data over many servers and locations.
Where I work, data has grown Q Multiple versions: we can store
from less than 1GB on a single server complete snapshots of our system
to more than 500GB spread out on every hour, and deduplication
more than 30 servers in multiple data means the incremental storage cost
centers. Catastrophes like the events for each new backup is negligible.
at Distribute IT and Code Spaces
demonstrate that ineffective backup Q Storing very large files: database
practices can destroy a thriving backups can be very large but
business. Enterprise-level backup differ in small ways that are not
solutions typically cost a prohibitive block-aligned (imagine inserting
amount, but the tools we need to one byte at the beginning of a
create a backup solution exist within file). Byte-level deduplication
the Open Source community. means we store only the changes
between the versions, similar to
zbackup to the Rescue doing a diff.
After switching between many
different backup strategies, I have Q Storing many small files:
found what is close to an ideal backup backing up millions of files
solution for our particular use case. gives a much smaller number
That involves regularly backing up of deduplicated blocks that can
many machines with huge numbers be managed more easily.
of files as well as very large files and
being able to restore any backup Q Easily replicating between disks and
previously made. over a WAN: the files in the storage

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 65

LJ247-Nov2014bu.indd 65 10/24/14 2:15 PM


FEATURE Ideal Backups with zbackup

pool are immutable; new blocks Comparing Alternatives


are stored as new files. This makes There are many alternatives to using
rsyncing them to other drives or zbackup. I compare some of the
machines very fast and efficient. options below:
It also means we can synchronize
them to virtually any kind of Q tape: has a relatively high cost,
machine or file storage. and takes a long time to read
and write as the entire backup
Q Compression: compressing files is written. This is a good option
gives significant size reductions, for archival storage, but it is
but using it often stops rsync unsuitable for frequent snapshots
or deduplication from working. because you can’t write a 500GB
zbackup compresses the blocks tape every hour.
after deduplication, so rsyncing
is still efficient. As mentioned Q rsnapshot: does not handle small
previously, only new blocks need changes in large files in any
to be rsynced. reasonable way, as a new copy is
kept for each new version. Taking
Q Fast backups: backups after the snapshots of large numbers of
first one are done at close to files causes a huge I/O load on
the disk-read speed. More the central backup server when
important, by running zbackup they are copied and when they
on each server, the majority of are deleted. It is also very slow
the CPU and I/O load is to synchronize the hard links to
decentralized. This means there another device or machine.
is minimal CPU or I/O required
on the central server and Q Tarsnap: this is an excellent
only deduplicated blocks are product and very reasonably
transferred, providing scalability. priced. Slow restores and being
dependent on a third party for
Q Highly redundant: by synchronizing storage make this a good fallback
to external drives and other servers, option but possibly unsuitable as
even corruption or destruction of your only method of backup.
the backups means we can recover
our information. Q Git: doesn’t handle large files

66 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 66 10/24/14 2:15 PM


efficiently (or in some cases fails properties compared to the other
completely). It also doesn’t easily backup formats, as discussed
handle anything with Git control previously, so that the remaining steps
files in it, so it makes backing can be tailored depending on the level
up your Git repositories a real of availability and durability you need.
challenge. As Git is so poor at large
files, tarring directories and using 1. Each virtual server uses
the tar file is not feasible. zbackup to back up to a local
deduplicated block store. This
Q :&3"42&3 FILESYSTEM SNAPSHOTS means every snapshot is available
are very fast and work well for locally if needed.
small files. Even the smallest
change in a file requires the file 2. The zbackup store then is replicated
to be re-copied (this is not strictly to a central backup server where it
TRUE FOR :&3 IF DEDUPLICATING IS can be recovered if needed.
enabled; however, this has a
significant memory load and it 3. The zbackup stores on the
works only if the file is unchanged central server are replicated
for most of its blocks, like an Mbox out to other servers.
file or database backing store).
4. The backups also are synchronized
Q Duplicity: this seems similar to to external storage—for example,
zbackup and has many of the same a USB drive. We rotate between
benefits, except deduplicating drives so that drives are kept
between files with different names. off-site and in case of disaster
Although it has been in beta for a or backup corruption.
long time, it seems to have many
features for supporting remote 5. Finally, we snapshot the
back ends, whereas zbackup is filesystem where the zbackup
simply a deduplicating block store. stores are located.

Summary of Approach Using zbackup


The key part of this approach is zbackup fits right into the UNIX
using zbackup in step 1. The backups philosophy. It does two seemingly
produced by zbackup have remarkable simple things that make it behave

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 67

LJ247-Nov2014bu.indd 67 10/24/14 2:15 PM


FEATURE Ideal Backups with zbackup

almost like a file. The first is taking your path, start by initializing a block
a stream of data passed to stdin and store (in these examples, I am running
writing it to a block store. A handle as root, but that is not a requirement):
to the data is stored in a small
backup file, stored next to the block # zbackup init --non-encrypted /tmp/zbackup/

store. The second is taking that


backup file and writing the original Hopefully you don’t use /tmp for
data to stdout. your real backups! You can list out
During the process, zbackup will the block store as below—the Web
identify blocks of data that it has seen site has great information on what
before and deduplicate it and then goes where. The main one to keep in
compress any new data before writing mind is backups; this is where your
it out to disk. When deduplicating backup files go:
data, zbackup uses a sliding window
that moves a byte at a time, so that if # ls /tmp/zbackup
you insert a single byte into a file, it backups bundles index info
still can identify the repeated blocks.
This is in contrast to block-level Let’s back up a database backup
DEDUPLICATION LIKE THAT FOUND IN :&3 file—this takes a while the first time
To start using zbackup, you must (Listing 1).
install it from source. This is very easy To check where that went, look at
to do; just follow the instructions on Listing 2. As you can see, the backup
the http://zbackup.org Web site. file is only 135 bytes. Most of the
Assuming you have installed data is stored in /bundles, and it is
zbackup, and that /usr/local/bin is in less than one tenth the size of the

Listing 1. Backing Up One File


# ls -l /tmp/database.sql
-rw-r--r-- 1 root root 406623470 Sep 14 17:41 /tmp/database.sql
# cat /tmp/database.sql | zbackup backup
´/tmp/zbackup/backups/database.sql
Loading index...
Index loaded.
Using up to 8 thread(s) for compression

68 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 68 10/24/14 2:15 PM


This should complete much
Listing 2. Check the Backup
more quickly, both because
# ls -l /tmp/zbackup/backups/database.sql
the file is cached and because
-rw------- 1 root root 135 Sep 14 17:43
most of the blocks already have
´/tmp/zbackup/backups/database.sql
been deduplicated:
# du --max-depth=1 /tmp/zbackup/
8 /tmp/zbackup/backups
# du --max-depth=0 /tmp/zbackup/
208 /tmp/zbackup/index
29768 /tmp/zbackup/
29440 /tmp/zbackup/bundles

In this example, the changes I


original database. made to the file have only slightly
Now, make a small change to the increased the size of the backup.
backup file to simulate some use and Let’s now restore the second
then back it up again (see Listing 3). backup. Simply pass the backup
This example illustrates an important handle to zbackup restore, and the
point, that zbackup will not change file is written to stdout:
any file in the data store. You can
rename the files in the /backup # zbackup restore /tmp/zbackup/backups/1/2/3/database.sql >

directory if you choose. You also can ´/tmp/database.sql.restored

have subdirectories under /backups,


as shown in Listing 4, where the Now you can check the file you
backup finally works. restored to prove it is the same as

Listing 3. Backing Up a File Again


# cat /tmp/database.sql | zbackup --silent backup
´/tmp/zbackup/backups/database.sql
Won't overwrite existing file /tmp/zbackup/backups/database.sql

Listing 4. Backing Up a File, Part 2


# mkdir -p /tmp/zbackup/backups/1/2/3/
# cat /tmp/database.sql | zbackup --silent backup
´/tmp/zbackup/backups/1/2/3/database.sql

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 69

LJ247-Nov2014bu.indd 69 10/24/14 2:15 PM


FEATURE Ideal Backups with zbackup

Listing 5. Checking the Restored File


# ls -l /tmp/database.sql*
-rw-r--r-- 1 root root 406622180 Sep 14 17:47 /tmp/database.sql
-rw-r--r-- 1 root root 406622180 Sep 14 17:53
´/tmp/database.sql.restored
# md5sum /tmp/database.sql*
179a33abbc3e8cd2058703b96dff8eb4 /tmp/database.sql
179a33abbc3e8cd2058703b96dff8eb4 /tmp/database.sql.restored

Listing 6. tar and Back Up a Directory


# tar -c /tmp/files | zbackup
´--silent backup /tmp/zbackup/backups/files.tar
# du --max-depth=0 /tmp/zbackup
97128 /tmp/zbackup

the file you originally backed up to be very confusing later on. If you
(Listing 5). name your backup file based on the
Of course, in most cases, you name of the file to which it restores,
aren’t backing up a single file. This it makes it much easier to work out
is where the UNIX philosophy works what each backup is.
well—because tar can read from stdin Now you can restore this backup
and write to stdout, you simply can using the example in Listing 7.
chain zbackup to tar. Listing 6 shows Most of the example is creating the
an example of backing up a large directory to restore to and comparing
directory structure in /tmp/files/ using the restored backup to the original.
tar piped to zbackup. If you are backing up frequently,
Now there are two backups of the it makes sense to organize your
database file and a tarred backup of backups in directories by date.
/tmp/files in the one zbackup store. The example in Listing 8 has a
There is nothing stopping you from directory for each month, then a
calling your backup file files.tar.gz or subdirectory for each day and, finally,
anything else; however, this is going a subdirectory for each time of

70 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 70 10/24/14 2:15 PM


Listing 7. Restoring from zbackup
# mkdir /tmp/files.restore
# cd /tmp/files.restore/
# zbackup --silent restore /tmp/zbackup/backups/files.tar | tar -x
# diff -rq /tmpfiles.restore/tmp/files/ /tmp/files/

Listing 8. Organize Your Backups


# export DATEDIR=`date "+%Y-%m/%d/%H:%M"`
# mkdir -p /tmp/zbackup/backups/$DATEDIR
# tar -c /tmp/files | zbackup --silent backup
´/tmp/zbackup/backups/$DATEDIR/files.tar
# cat /tmp/database.sql | zbackup backup
´/tmp/zbackup/backups/$DATEDIR/database.sql

day—for example, 2014-09/12/08:30/ candidate for things like filesystem


—and all the backups for that time SNAPSHOTS USING ,6- OR :&3
go in this directory. Once you have your backups in
Run this on a daily or hourly basis, zbackup, you can ship it to a
and you can restore any backup central server and drop it to USB
you have made, going back to the or tape, or upload it to Amazon
beginning of time. For the files I am S3 or even Dropbox.
backing up, the zbackup data for an
entire year is less than storing a single Benchmarks/Results
uncompressed backup. All this is good in theory, but the
The zbackup directory has the critical question is “How does it
extremely nice property that the files perform?” To give you an idea, I have
in it never change once they have run some benchmarks on a server
been written. This makes it very that has multiple similar versions of
fast to rsync (since only new files the same application—for example,
in the backup need to be read) and training, development, UAT. There
very fast to copy to other media like are roughly 5GB of databases
USB disks. It also makes it an ideal and 800MB of Web site files. The

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 71

LJ247-Nov2014bu.indd 71 10/24/14 2:15 PM


FEATURE Ideal Backups with zbackup

Table 1. Multiple Web Sites


SPACE TIME FILES

tar 743M 25s 1


tar & gzip 382M 44s 1
zbackup 105M 38s 203
zbackup 2 4K 30s 206
zbackup 3 632k 30s 209

Table 2. Single Web Site


SPACE TIME FILES

tar 280M 8s 1
tar & gzip 74M 9s 1
zbackup 66M 17s 131

server has eight cores and plenty of of additional storage was used. The
memory, although all buffers were backup also runs faster because most
flushed prior to each benchmark. of the data already is present.
All Web Sites: this is a collection The third time, four files of exactly
of 30,000 files taking roughly 100,000 random bytes were placed in
800MB of space. Table 1 illustrates the filesystem.
the results. zbackup delivers a Single Web Site: the compression
backup that is roughly a quarter of performance of zbackup in the first
the size of the gzipped tar file. Each test is in large part because there are
new backup adds three files—by multiple similar copies of the same
design, zbackup never modifies files Web site. This test backs up only one
but only adds them. of the Web sites to provide another
The first time zbackup runs and type of comparison.
backs up the entire directory, it takes The results are shown in Table 2.
longer, as there is no deduplicated The compression results are not much
data in the pool. On the first run, all better than gzip, which demonstrates
eight cores were fully used. On slower how effective the deduplication is
machines, throughput is less due to when doing multiple Web sites.
the high CPU usage. Database Files: this is a backup of
The second time, zbackup was run a database dump file, text format
over an identical file structure, only 4k uncompressed. The results are

72 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 72 10/24/14 2:15 PM


Table 3. Database File
SPACE TIME FILES

tar 377M 2s 1
tar & gzip 43M 10s 1
zbackup 29M 32s 192
zbackup 2 4M 3s 200
zbackup 3 164K 3s 210

shown in Table 3. network. Existing files never need to


The first run is zbackup backing be updated.
up a testing database of 377M. The Rather than benchmarking this, I
deduplication and compression give have reviewed the real logs for our
significant gains over tar and gzip, server. Synchronizing 6GB of data with
although it runs much slower. more than 30,000 files typically takes
The second zbackup was a training less than ten seconds. Compared with
database that is similar to the testing the previous method of rsyncing the
database, but it has an additional directory tree and large files that used
10MB of data, and some of the other to take between one to three minutes,
data also is different. In this case, this is an enormous improvement.
zbackup very effectively removes The central server has a slow disk
the duplicates, with very little extra and network; however, it is easily
storage cost. able to cope with the load from
The final zbackup was randomly synchronizing the zbackup. I suspect
removing clusters of rows from the even a Raspberry Pi would have
backup file to simulate the changes enough performance to act as a
that come from updates and deletes. synchronization target.
This is the typical case of backing up As they say, your mileage may vary.
a database over short periods of time, There are many factors that can alter
and it matches very closely with my the performance you get, such as:
observation of real-word performance.
Network Performance: by design, Q Disk speed.
zbackup does not modify or delete
files. This means the number of added Q CPU performance (which is
files and the additional disk space is particularly important for the
all you need to synchronize over the first backup).

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 73

LJ247-Nov2014bu.indd 73 10/24/14 2:15 PM


FEATURE Ideal Backups with zbackup

zbackup makes it relatively simple to


encrypt the data stored in the backup.

Q Nature of the files—for example, of a worthless backup.


binary database backups will For that reason, I include
compress less than text backups. snapshots of the filesystem to
guard against this and also rotate
Q Existence of multiple copies of the our media and regularly check the
same data. backups. As an alternative, you
could rsync just new files from the
Data Integrity and Security server being backed up and ignore
Deduplicating the data, zbackup deletions or file updates.
is particularly vulnerable to file The design of zbackup means that
corruption. A change to a single file retrieving a backup also checks it
could make the entire data store for consistency, so it is worthwhile
useless. It is worthwhile to check to try restoring your backups on a
your media to ensure they are in regular basis.
good condition. On the plus side, you Another point to consider is
probably can copy an entire year’s whether there is a single company,
worth of backups of 200GB of data credential or key that, if compromised,
to another disk in less than an hour. could cause the destruction of all
Having multiple versions of your backups. Although it is useful to
backups available in the same have multiple media and servers, if a
zbackup store is not the same as single hacker can destroy everything,
having multiple copies. Replicating you are vulnerable in the same way
your zbackup store to other disks or the two companies mentioned in the
servers does not solve the problem. introduction were. Physical media
As an example, if someone were that is rotated off-site is a good way
to modify some files in the backup to achieve this, or else a separate
store, and then that was blindly server with a completely different set
replicated to every machine or disk, of credentials.
you would have many exact copies zbackup makes it relatively simple

74 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 74 10/24/14 2:15 PM


to encrypt the data stored in the that restoring a single file requires
backup. If you are storing your the computer to restore the entire
backups on insecure or third-party backup and untar only the file you
machines, you may want to use this require. For small backups, this
facility. When managing backups for may be fine, but if your directory
multiple servers, I prefer to encrypt structures are very large, it may be
the media where the backups are worthwhile to back up directories
stored using LUKS. This includes the individually rather than in one go.
drives within the servers and the For example, you might choose to
removable USB drives. back up Web sites individually.
zbackup currently is limited by the
Other Considerations speed at which the data can be read
It is particularly important that you in and streamed to the deduplication
don’t compress or encrypt your files process. A file must be read in full and
as part of a process before you pass then deduplicated even if it hasn’t
them to zbackup. Otherwise, you changed. This is roughly equivalent to
will find the deduplication will be rsync -c (that is, checksum the file
completely ineffective. For example, content rather than just comparing
Postgres allows you to compress your the file metadata). To scale to really
backups when writing the file. If this large data sizes, zbackup may need
option were used, you would get no to incorporate some of the tar
benefit from using zbackup. facilities within itself, so that if it can
In the architecture here, I have determine a file hasn’t changed (by
suggested doing the zbackup on each inode and metadata), it deduplicates
server rather than centralizing it. the file without reading it. Q
This means that although duplicates
within a server are merged, David Barton is the Managing Director of OneIT, a company
duplicates between servers are not. specializing in custom business software development. David
For some applications, that may not has been using Linux since 1998 and managing the company’s
be good enough. In this case, you Linux servers for more than ten years.
might consider running zbackup on
the virtualization host to deduplicate
the disk files. Send comments or feedback via
zbackup and tar are both stream- http://www.linuxjournal.com/contact
oriented protocols. This means or to ljeditor@linuxjournal.com.

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 75

LJ247-Nov2014bu.indd 75 10/24/14 2:15 PM


FEATURE High-Availability Storage with HA-LVM

High-
Availability
Storage
with
HA-LVM Deploy a storage solution
with zero downtime.
PETROS KOUTOUPIS

76 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 76 10/24/14 2:15 PM


I
n recent years, there has HA-LVM
been a trend in which data High Availability Logical Volume
centers have been opting for Manager (HA-LVM) is an add-on to
commodity hardware and software the already integrated LVM suite.
over proprietary solutions. Why It enables a failover configuration
shouldn’t they? It offers extremely for shared volumes—that is, if
low costs and the flexibility to one server in a cluster fails or
build an ecosystem the way it is is taken down for maintenance,
preferred. The only limitation is the shared storage configuration
the extent of the administrator’s will fail over to the secondary
imagination. However, a question server where all I/O requests
needs to be asked: “How would will resume, uninterrupted. An
such a customized solution HA-LVM configuration is an active/
compare to its proprietary and passive configuration. This means
more costly counterpart?” that a single server accesses the
Open-source projects have shared storage at any one time.
evolved and matured enough to In many cases, this is an ideal
stay competitive and provide the approach, as some of advanced
same feature-rich solutions that LVM features, such as snapshot
include volume management, data and data deduplication, are not
snapshots, data deduplication and supported in an active/active
so on. Although an often overlooked environment (when more than one
and longtime-supported concept is server accesses the shared storage).
high availability. A very important component to
The idea behind high availability is HA-LVM is the CLVM or Clustered
simple: eliminate any single point of LVM dæmon. When enabled, the
failure. This ensures that if a server CLVM dæmon prevents corruption
node or a path to the underlying of LVM metadata and its logical
storage goes down (planned or volumes, which occurs if multiple
unplanned), data requests still can be machines make overlapping changes.
served. Now there are multiple layers Although in an active/passive
to a storage-deployed solution that configuration, this becomes less
can be configured for high availability of a concern. To accomplish this,
and that is why this article focuses the dæmon relies on a Distributed
strictly on HA-LVM. Lock Manager or DLM. The purpose

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 77

LJ247-Nov2014bu.indd 77 10/24/14 2:15 PM


FEATURE High-Availability Storage with HA-LVM

Figure 1. A Sample Configuration of Two Servers Accessing the Same Shared Storage

of the DLM is to coordinate disk This external storage could be a


access for CLVM. RAID-enabled or JBOD enclosure of
The following example will cluster disk drives, connected to the servers
two servers that have access to the via a Fibre Channel, Serial Attached
same external storage (Figure 1). SCSI (SAS), iSCSI or other Storage

CLVM CLVM Dæmon


CLVM is not compatible with The CLVM dæmon distributes
MD RAID, as it does not support LVM metadata updates across the
clusters yet. cluster, and it must be running
on all nodes in that cluster.

78 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 78 10/24/14 2:15 PM


on all participating servers, the
JBOD cluster configuration file must be
configured to enable the cluster.
A JBOD (or Just a Bunch Of
To accomplish this, create and
Disks) is an architecture using
multiple hard drives, but not in
modify /etc/cluster/cluster.conf
a redundant configuration. with the following:

<cluster name="lvm-cluster" config_version="1">

Area Network (SAN) mapping. The <cman two_node="1" expected_votes="1" />

configuration is storage protocol- <clusternodes>

agnostic and requires only that <clusternode name="serv-0001" nodeid="1">

the clustered servers see the same <fence>

shared block devices. </fence>

</clusternode>

Configuring the Cluster <clusternode name="serv-0002" nodeid="2">

Almost all Linux distributions <fence>

offer the required packages. </fence>

However, the names may differ </clusternode>

in each. You need to install </clusternodes>

lvm2-cluster (in some distributions, <logging debug="on">

the package may be named clvm), </logging>

the Corosync cluster engine, the <dlm protocol="tcp" timewarn="500">

Red Hat cluster manager (or cman), </dlm>

the Resource Group manager <fencedevices>

dæmon (or rgmanager) and all </fencedevices>

their dependencies on all <rm>

participating servers. Even though </rm>

the Red Hat cluster manager </cluster>

contains the Linux distribution


of the same name in its package Note that the clusternode name
description, most modern is the server’s hostname (change
distributions unrelated to Red Hat where necessary). Also, make sure
will list it in their repositories. the cluster.conf file is identical on all
Once the appropriate clustering servers in the cluster.
packages have been installed The Red Hat cluster manager needs

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 79

LJ247-Nov2014bu.indd 79 10/24/14 2:15 PM


FEATURE High-Availability Storage with HA-LVM

You now have a working cluster. The


next step is to enable the Clustered
LVM in High Availability mode.
to be started: serv-0002 2 Offline

$ sudo /etc/rc.d/init.d/cman start Otherwise, when all servers are


Starting cluster: configured appropriately and the
Checking if cluster has been disabled at boot... [ OK ] cman service is enabled, all nodes
Checking Network Manager... [ OK ] will appear with an Online status:
Global setup... [ OK ]

Loading kernel modules... [ OK ] $ sudo clustat

Mounting configfs... [ OK ] Cluster Status for lvm-cluster @ Sun Aug 3 11:36:43 2014

Starting cman... [ OK ] Member Status: Quorate

Waiting for quorum... [ OK ]

Starting fenced... [ OK ] Member Name ID Status

Starting dlm_controld... [ OK ] ------ ---- ---- ------

Tuning DLM kernel config... [ OK ] serv-0001 1 Online

Starting gfs_controld... [ OK ] serv-0002 2 Online, Local

Unfencing self... [ OK ]

Joining fence domain... [ OK ] You now have a working cluster.


The next step is to enable the
If a single node in the cluster is not Clustered LVM in High Availability
active, it will appear as off-line: mode. In this scenario, you have a
single volume from the shared storage
$ sudo clustat enclosure mapped to both servers.
Cluster Status for lvm-cluster @ Sun Aug 3 11:31:51 2014 Both servers are able to observe and
Member Status: Quorate access this volume as /dev/sdb.
The /etc/lvm/lvm.conf file needs to
Member Name ID Status be modified for this. The locking_type
------ ---- ---- ------ parameter in the global section has
serv-0001 1 Online, Local to be set to the value 3. It is set to

80 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 80 10/24/14 2:15 PM


1 by default: (logical volume change) command
deactivates the logical volume. You
# Type of locking to use. Defaults to local file-based will be relying on the CLVM and
# locking (1). resource manager (read below)
# Turn locking off by setting to 0 (dangerous: risks metadata dæmons to handle activations based
# corruption if LVM2 commands get run concurrently). on the failover feature additions made
# Type 2 uses the external shared library locking_library. in the same /etc/cluster/cluster.conf
# Type 3 uses built-in clustered locking. file created earlier. When active, the
# Type 4 uses read-only locking which forbids any operations the shared volume will be accessible
# that might change metadata. from /dev/shared_vg/ha_lv.
locking_type = 3 Add the necessary failover details to
the cluster.conf file:
On one of the servers, create a
volume group, logical volume and <rm>

filesystem from the designated <failoverdomains>

shared volume: <failoverdomain name="FD" ordered="1" restricted="0">

<failoverdomainnode name="serv-0001" priority="1"/>

$ sudo pvcreate /dev/sdb <failoverdomainnode name="serv-0002" priority="2"/>

</failoverdomain>

$ sudo vgcreate -cy shared_vg /dev/sdb </failoverdomains>

<resources>

$ sudo lvcreate -L 50G -n ha_lv shared_vg <lvm name="lvm" vg_name="shared_vg" lv_name="ha-lv"/>

<fs name="FS" device="/dev/shared_vg/ha-lv"

$ sudo mkfs.ext4 /dev/shared_vg/ha_lv ´force_fsck="0" force_unmount="1" fsid="64050"

´fstype="ext4" mountpoint="/mnt" options=""

$ sudo lvchange -an shared_vg/ha_lv ´self_fence="0"/>

</resources>

The example above carves out a <service autostart="1" domain="FD" name="serv"

50GB logical volume from the volume ´recovery="relocate">

group and then formats it with an <lvm ref="lvm"/>

Extended 4 filesystem. The cy option <fs ref="FS"/>

used with the vgcreate (volume </service>

group create) command enables the </rm>

volume group for clustered locking.


The an option with the lvchange The “rm” portion of the cluster.conf

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 81

LJ247-Nov2014bu.indd 81 10/24/14 2:15 PM


FEATURE High-Availability Storage with HA-LVM

The purpose of the fencing agent is to


handle a problematic node before it
causes noticeable issues to the cluster.

file utilizes the resource manager Assuming that no errors were


(or rgmanager). In this addition to observed, you now should have a
the configuration file, you inform running cluster configured in an
the cluster manager that serv-0001 active/passive configuration. You
should have ownership and sole can validate this by checking the
access to the shared volume first. accessibility of the shared volume
It will be mounted locally at the on all servers. It should be seen,
/mnt absolute path. If and when enabled and mounted on serv-0001
serv-0001 goes down for any reason, and not on serv-0002. Now comes
the resource manager then will the moment of truth—that is,
perform a failover that will enable testing the failover. Manually power
sole access to the shared volume, down serv-0001. You will notice the
mounted at /mnt on serv-0002. rgmanager kicking in and enabling/
All pending I/O requests sent to mounting the volume on serv-0002.
serv-0001 will resume on serv-0002.
On all servers, restart the cman
NOTE: To enable these services
service to enable the new configuration:
automatically on reboot, use
chkconfig to start the services
$ sudo /etc/rc.d/init.d/cman restart
on all appropriate runlevels.

Also, on all servers, start the


rgmanager and clvmd services:
Summary
$ sudo /etc/rc.d/init.d/rgmanager start In an ideal configuration, fencing
Starting Cluster Service Manager: [ OK ] agents will need to be configured
in the /etc/cluster/cluster.conf file.
$ sudo /etc/rc.d/init.d/clvmd start The purpose of the fencing agent
Starting clvmd: [ OK ] is to handle a problematic node

82 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 82 10/24/14 2:15 PM


before it causes noticeable issues <fencedevice agent="fence_ipmilan" ipaddr="192.168.1.50"

to the cluster. For example, if a ´login="ADMIN" passwd="ADMIN" name="ipmirecover1"/>

server suffers from a kernel panic, is <fencedevice agent="fence_ipmilan" ipaddr="192.168.10"

not communicating with the other ´login="ADMIN" passwd="ADMIN" name="ipmirecover2"/>

servers in the cluster, or something </fencedevices>

else just as devastating, the IPMI


utilities can be configured to reboot The primary objective of HA-LVM
the server in question: is to provide the data center with
enterprise-class fault tolerance
<clusternode name="serv-0001" nodeid="1"> at a fraction of the price. No one
<fence> ever wants to experience server
<method name="1"> downtimes, and with an appropriate
<device name="ipmirecover1"/> configuration, no one has to.
</method> From the data center to your home
</fence> office, this solution can be deployed
</clusternode> almost anywhere. Q
<clusternode name="serv-0002" nodeid="2">

<fence> Petros Koutoupis is a full-time Linux kernel, device driver and


<method name="1"> application developer for embedded and server platforms. He
<device name="ipmirecover2"/> has been working in the data storage industry for more than
</method> eight years and enjoys discussing the same technologies.
</fence>

</clusternode>

[ ... ] Send comments or feedback via


http://www.linuxjournal.com/contact
<fencedevices> or to ljeditor@linuxjournal.com.

Resources

clvmd(8): Linux man page

Appendix F. High Availability LVM (HA-LVM): https://access.redhat.com/documentation/en-US/


Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/ap-ha-halvm-CA.html

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 83

LJ247-Nov2014bu.indd 83 10/24/14 2:15 PM


FEATURE Sharing Admin Privileges for Many Hosts Securely

Sharing
Admin
Privileges
for
Many Hosts
Securely
The ssh-agent program can hold your decrypted
authentication keys in memory. This makes a lot
of things possible—one of them is controlling
shared accounts on large numbers of hosts.

J. D. BALDWIN

84 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 84 10/24/14 2:15 PM


T
he problem: you have a require sharing of passwords or
large team of admins, with key passphrases.
a substantial turnover rate.
Maybe contractors come and go. It works between any UNIX or Linux
Maybe you have tiers of access, due platforms that understand SSH key trust
to restrictions based on geography, relationships. I personally have made
admin level or even citizenship (as use of it on a half-dozen different Linux
with some US government contracts). distros, as well as Solaris, HP-UX, Mac
You need to give these people OS X and some BSD variants.
administrative access to dozens In our case, the hosts to be
(perhaps hundreds) of hosts, and managed were several dozen Linux-
you can’t manage all their accounts based special-purpose appliances
on all the hosts. that did not support central account
This problem arose in the large-scale management tools or sudo. They are
enterprise in which I work, and our intended to be used (when using the
team worked out a solution that: shell at all) as the root account.
Our environment also (due to a
Q Does not require updating accounts government contract) requires a
on more than one host whenever two-tier access scheme. US citizens
a team member arrives or leaves. on the team may access any host as
root. Non-US citizens may access only
Q Does not require deletion or a subset of the hosts. The techniques
replacement of Secure Shell described in this article may be
(SSH) keys. extended for N tiers without any real
trouble, but I describe the case N == 2
Q Does not require management in this article.
of individual SSH keys.
The Scenario
Q Does not require distributed I am going to assume you, the reader,
sudoers or other privileged-access know how to set up an SSH trust
management tools (which may not relationship so that an account on
be supported by some Linux-based one host can log in directly, with no
appliances anyway). password prompting, to an account
on another. (Basically, you simply
Q And most important, does not create a key pair and copy the public

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 85

LJ247-Nov2014bu.indd 85 10/24/14 2:15 PM


FEATURE Sharing Admin Privileges for Many Hosts Securely

half to the remote host’s ~/.ssh/ Q n1, n2, ... — hostnames of target
authorized_keys file.) If you don’t hosts for which access is to be
know how to do this, stop reading granted for all team members
now and go learn. A Web search for (“n” for “non-special”).
“ssh trust setup” will yield thousands
of links—or, if you’re old-school, Q s1, s2, ... — hostnames of
the AUTHENTICATION section of the target hosts for which access
ssh(1) man page will do. Also see is to be granted only to
ssh-copy-id(1), which can greatly some team members (“s”
simplify the distribution of key files. for “special”).
Steve Friedl’s Web site has an
excellent Tech Tip on these basics, Accounts (on darter only):
plus some material on SSH agent-
forwarding, which is a neat trick to Q univ — the name of the utility
centralize SSH authentication for an account holding the SSH keys
individual user. The Tech Tip is available that all target hosts (u1, u2, ...)
at http://www.unixwiz.net/techtips/ will trust.
ssh-agent-forwarding.html.
I describe key-caching below, as it Q spec — the name of the utility
is not very commonly used and is the account holding the SSH keys that
heart of the technique described herein. only special, restricted-access,
For illustration, I’m assigning hosts (s1, s2, ...) will trust.
names to players (individuals
assigned to roles), the tiers of access Q joe — let’s say the name of the
and “dummy” accounts. guy administering the whole
Hosts: scheme is “Joe” and his
account is “joe”. Joe is a trusted
Q darter — the hostname of the admin with “the keys to the
central management host on kingdom”—he cannot be a
which all the end-user and restricted user.
utility accounts are active, all
keys are stored and caching Q andy, amy — these are users who
takes place; also, the sudoers are allowed to log in to all hosts.
file controlling access to utility
accounts is here. Q alice

86 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 86 10/24/14 2:15 PM


Q ned, nora — these are users who $ mkdir .ssh # if not already present

are allowed to log in only to “n” $ ssh-keygen -t rsa -b 2048 -C "universal access

(non-special) hosts; they never ´key gen YYYYMMDD" -f

should be allowed to log in to .ssh/univ_key

special hosts s1, s2, ... Enter passphrase (empty for no passphrase):

Q nancy Very important: Joe assigns a strong


passphrase to this key. The passphrase
You will want to create shared, to this key will not be generally shared.
unprivileged utility accounts on (The field after -C is merely a
darter for use by unrestricted comment; this format reflects my
and restricted admins. These personal preference, but you are of
(per our convention) will be called course free to develop your own.)
“univ” and “rstr”, respectively. This will generate two files in .ssh:
No one should actually directly univ_key (the private key file) and
log in to univ and rstr, and in fact, univ_key.pub (the public key file). The
these accounts should not have private key file is encrypted, protected
passwords or trusted keys of their by the very strong passphrase Joe
own. All logins to the shared utility assigned to it, above.
accounts should be performed with Joe logs out of the univ account and
su(1) from an existing individual into rstr. He executes the same steps,
account on darter. but creates a keypair named rstr_key
instead of univ_key. He assigns a
The Setup strong passphrase to the private key
Joe’s first act is to log in to darter file—it can be the same passphrase
and “become” the univ account: as assigned to univ, and in fact,
that is probably preferable from the
$ sudo su - univ standpoint of simplicity.
Joe copies univ_key.pub and
Then, under that shared utility rstr_key.pub to a common location
account, Joe creates a .ssh directory for convenience.
and an SSH keypair. This key will For every host to which access is
be trusted by the root account on granted for everyone (n1, n2, ...), Joe
every target host (because it’s the uses the target hosts’ root credentials
“univ”-ersal key): to copy both univ_key.pub and

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 87

LJ247-Nov2014bu.indd 87 10/24/14 2:15 PM


FEATURE Sharing Admin Privileges for Many Hosts Securely

Any user who uses SSH keys whose key files


are protected by a passphrase may cache those
keys using a program called ssh-agent.

rstr_key.pub (on separate lines) to the The Trick


file .ssh/authorized_keys under the First, let’s talk about key-caching. Any
root account directory. user who uses SSH keys whose key
For every host to which access is files are protected by a passphrase
granted for only a few (s1, s2, ...), Joe may cache those keys using a program
uses the target hosts’ root credentials called ssh-agent. ssh-agent does not
to copy only rstr_key.pub (on a single take a key directly upon invocation.
line) to the file .ssh/authorized_keys It is invoked as a standalone program
under the root account directory. without any parameters (at least, none
So to review, now, when a user useful to us here).
uses su to “become” the univ The output of ssh-agent is a couple
account, he or she can log in to environment variable/value pairs,
any host, because univ_key.pub plus an echo command, suitable for
exists in the authorized_keys file of input to the shell. If you invoke it
n1, n2, ... and s1, s2, .... “straight”, these variables will not
However, when a user uses su to become part of the environment.
“become” the rstr account, he or For this reason, ssh-agent always is
she can log in only to n1, n2, ..., invoked as a parameter of the shell
because those hosts’ authorized_keys built-in eval :
files contain rstr_key.pub, but not
univ_key.pub. $ eval $(ssh-agent)
Of course, in order to unlock the Agent pid 29013
access in both cases, the user will
need the strong passphrase with (The output of eval also includes an
which Joe created the keys. That echo statement to show you the PID of
seems to defeat the whole purpose the agent instance you just created.)
of the scheme, but there’s a trick to Once you have an agent running,
get around it. and your shell knows how to

88 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 88 10/24/14 2:15 PM


communicate with it (thanks to 1. First, check whether there is a
the environment variables), you current instance of ssh-agent for
may cache keys with it using the the current account.
command ssh-add . If you give
ssh-add a key file, it will prompt 2. If not, invoke ssh-agent and
you for the passphrase. Once you capture the environment variables
provide the correct passphrase, in a special file in /tmp. (It should
ssh-agent will hold the unencrypted be in /tmp because the contents of
key in memory. Any invocation of /tmp are cleared between system
SSH will check with ssh-agent before reboots, which is important for
attempting authentication. If the managing cached keys.)
key in memory matches the public
key on the remote host, trust is 3. If so, find the file in /tmp that
established, and the login simply holds the environment variables
happens with no entry of passwords and source it into the shell’s
or passphrases. environment. (Also, handle the
(As an aside: for those of you who error case where the agent is
use the Windows terminal program running and the /tmp file is not
PuTTY, that tool provides a key-caching found by killing ssh-agent and
program called Pageant, which starting from scratch.)
performs much the same function.
PuTTY’s equivalent to ssh-keygen is All of the above assumes the
a utility called PuTTYgen.) key already has been unlocked and
All you need to do now is set it cached. (I will come back to that.)
up so the univ and rstr accounts Here is what the code in .bash_profile
set themselves up on every login to looks like for the univ account:
make use of persistent instances of
ssh-agent. Normally, a user manually /usr/bin/pgrep -u univ 'ssh-agent' >/dev/null

invokes ssh-agent upon login, makes


use of it during that session, then kills RESULT=$?

it, with eval $(ssh-agent -k) ,


before exiting. Instead of manually if [[ $RESULT -eq 0 ]] # ssh-agent is running

managing it, let’s write into each then

utility account’s .bash_profile some if [[ -f /tmp/.env_ssh.univ ]] # bring env in to session

code that does the following: then

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 89

LJ247-Nov2014bu.indd 89 10/24/14 2:15 PM


FEATURE Sharing Admin Privileges for Many Hosts Securely

source /tmp/.env_ssh.univ against ~/.ssh/rstr_key. The command


else # error condition ssh-add -l lists cached keys by
echo 'WARNING: univ ssh agent running, no environment their fingerprints and filenames, so if
´file found' there is doubt about whether a key
echo ' ssh-agent being killed and restarted ... ' is cached, that’s how to find out. A
/usr/bin/pkill -u univ 'ssh-agent' >/dev/null single agent can cache multiple keys, if
RESULT=1 # due to kill, execute startup code below you have a use for that, but it doesn’t
fi come up much in my environment.
Once the keys are cached, they will
if [[ $RESULT -ne 0 ]] # ssh-agent not running, start stay cached. ( ssh-add -t <N> may be
´it from scratch used to specify a timeout of N seconds,
then but you won’t want to use that option
echo "WARNING: ssh-agent being started now; for this shared-access scheme.) The
´ask Joe to cache key" cache must be rebuilt for each account
/usr/bin/ssh-agent > /tmp/.env_ssh.univ whenever darter is rebooted, but since
/bin/chmod 600 /tmp/.env_ssh.univ darter is a Linux host, that will be a
source /tmp/.env_ssh.univ rare event. Between reboots, the single
fi instance (one per utility account) of
ssh-agent simply runs and holds the
And of course, the code is key in memory. The last time I entered
identical for the rstr account, except the passphrases of our utility account
s/univ/rstr/ everywhere. keys was more than 500 days ago—
Joe will have to intervene once and I may go several hundred more
whenever darter (the central before having to do so again.
management host on which all the The last step is setting up sudoers to
user accounts and the keys reside) manage access to the utility accounts.
is restarted. Joe will have to log You don’t really have to do this. If you
on and become univ and execute like, you can set (different) passwords
the command: for univ and rstr and simply let the
users hold them. Of course, shared
$ ssh-add ~/.ssh/univ_key passwords aren’t a great idea to begin
with. (That’s one of the major points
and then enter the passphrase. of this whole scheme!) Every time one
Joe then logs in to the rstr account of the users of the univ account leaves
and executes the same command the team, you’ll have to change that

90 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 90 10/24/14 2:15 PM


password and distribute the new one Every host s1, s2, s3 and so on has
(hopefully securely and out-of-band) only the univ key file in authorized_keys.
to all the remaining users. When darter is rebooted, Joe logs in
No, managing access with sudoers to both the univ and rstr accounts and
is a better idea. This article isn’t here executes the ssh-add command with
to teach you all of—or any of—the ins the private key file as a parameter. He
and outs of sudoers’ Extremely Bizarre enters the passphrase for these keys
Nonsensical Frustration (EBNF) syntax. when prompted.
I’ll just give you the cheat code. Now Andy (for example) can log in
Recall that Andy, Amy, Alice and to darter, execute:
so on were all allowed to access all
hosts. These users are permitted to $ sudo su - univ
use sudo to execute the su - univ
command. Ned, Nora, Nancy and so and authenticate with his password.
on are permitted to access only the He now can log in as root to any of
restricted list of hosts. They may log in n1, n2, ..., s1, s2, ... without further
only to the rstr account using the su authentication. If Andy needs to
- rstr command. The sudoers entries check the functioning of ntp (for
for these might look like: example) on each of 20 hosts, he can
execute a loop:
User_Alias UNIV_USERS=andy,amy,alice,arthur # trusted

User_Alias RSTR_USERS=ned,nora,nancy,nyarlathotep # not so much $ for H in n1 n2 n3 [...] n10 s1 s2 s3 [...] s10

> do

# Note that there is no harm in putting andy, amy, etc. into > ssh -q root@$H 'ntpdate -q timeserver.domain.tld'

# RSTR_USERS as well. But it also accomplishes nothing. > done

Cmnd_Alias BECOME_UNIV = /bin/su - univ and it will run without further


Cmnd_Alias BECOME_RSTR = /bin/su - rstr intervention.
Similarly, nancy can log in to
UNIV_USERS ALL= BECOME_UNIV darter, execute:
RSTR_USERS ALL= BECOME_RSTR

$ sudo su - rstr
Let’s recap. Every host n1, n2, n3
and so on has both univ and rstr key and log in to any of n1, n2 and so on,
files in authorized_keys. execute similar loops, and so forth.

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 91

LJ247-Nov2014bu.indd 91 10/24/14 2:15 PM


FEATURE Sharing Admin Privileges for Many Hosts Securely

Benefits and Risks well. That, after all, is the meaning of


Suppose Nora leaves the team. You saying the target hosts “trust” darter.
simply would edit sudoers to delete Furthermore, a user with root access
her from RSTR_USERS, then lock or who does not know the passphrase
delete her system account. to the keys still can recover the
“But Nora was fired for raw keys from memory with a little
misconduct! What if she kept a moderately sophisticated black
copy of the keypair?” magic. (Linux memory architecture
The beauty of this scheme is that and clever design of the agent
access to the two key files does not prevent non-privileged users from
matter. Having the public key file isn’t recovering their own agents’ memory
important—put the public key file on contents in order to extract keys.)
the Internet if you want. It’s public! Caveat the second: obviously,
Having the encrypted copy of anyone holding the passphrase can
the private key file doesn’t matter. make (and keep) an unencrypted
Without the passphrase (which only copy of the private keys. In
Joe knows), that file may as well be our example, only Joe had that
the output of /dev/urandom. Nora passphrase, but in practice, you will
never had access to the raw key file— want two or three trusted admins
only the caching agent did. to know the passphrase so they can
Even if Nora kept a copy of the key intervene to re-cache the keys after
files, she cannot use them for access. a reboot of darter.
Removing her access to darter removes If anyone with root access to your
her access to every target host. central management host (darter, in
And the same goes, of course, for this example) or anyone holding private
the users in UNIV_USERS as well. key passphrases should leave the
There are two caveats to this, and team, you will have to generate new
make sure you understand them well. keypairs and replace the contents of
Caveat the first: it (almost) goes authorized_keys on every target host
without saying that anyone with root in your enterprise. (Fortunately, if you
access to darter obviously can just are careful, you can use the old trust
become root, then su - univ at any relationship to create the new one.)
time. If you give someone root access For that reason, you will want
to darter, you are giving that person to entrust the passphrase only to
full access to all the target hosts as individuals whose positions on your

92 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 92 10/24/14 2:15 PM


team are at least reasonably stable. (via SSHKeyChain) and newer versions
The techniques described in this of GNOME (via Keyring) automatically
article are probably not suitable for know the first time you SSH to a host
a high-turnover environment with no with which you have a key-based
stable “core” admins. authentication set up, then ask you
One more thing about this: you your passphrase and cache the key
don’t need to be managing tiered for the rest of your GUI login session.
or any kind of shared access for this Given the lack of default timeouts and
basic trick to be useful. As I noted warnings about root users’ access to
above, the usual way of using an SSH unlocked keys, I am not sure this is an
key-caching agent is by invoking it at unmixed technological advance. (It is
session start, caching your key, then possible to configure timeouts in both
killing it before ending your session. utilities, but it requires that users find
However, by including the code above out about the option, and take the
in your own .bash_profile, you can effort to configure it.)
create your own file in /tmp, check
for it, load it if present and so on. Acknowledgements
That way, the host always has just one I gratefully acknowledge the technical
instance of ssh-agent running, and review and helpful suggestions of
your key is cached in it permanently David Scheidt and James Richmond in
(or until the next reboot, anyway). the preparation of this article. Q
Even if you don’t want to cache
your key that persistently, you still can J.D. Baldwin has been a UNIX, Linux and Web user and
make use of a single ssh-agent and administrator going back to SunOS 1.1 (1984), Slackware 3.0
cache your key with the timeout (-t) (1995) and Apache 1.2 (1997). He currently works in network
option mentioned earlier; you still will security for a large multinational company. J.D. is a graduate
be saving yourself a step. and former faculty member of the US Naval Academy and has
Note that if you do this, however, an MS in Computer Science from the University of Maryland.
anyone with root on that host will have He lives with his wife in their empty nest in southwest
access to any account of yours that Michigan. You can reach him at baldwin@panix.com.
trusts your account on that machine—
so caveat actor. (I use this trick only on
personal boxes that only I administer.) Send comments or feedback via
The trick for personal use is http://www.linuxjournal.com/contact
becoming obsolete, as Mac OS X or to ljeditor@linuxjournal.com.

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 93

LJ247-Nov2014bu.indd 93 10/24/14 2:15 PM


KNOWLEDGE HUB

WEBCASTS
Learn the 5 Critical Success Factors to Accelerate
IT Service Delivery in a Cloud-Enabled Data Center
Today's organizations face an unparalleled rate of change. Cloud-enabled data centers are increasingly seen as a way to accelerate
IT service delivery and increase utilization of resources while reducing operating expenses. Building a cloud starts with virtualizing
your IT environment, but an end-to-end cloud orchestration solution is key to optimizing the cloud to drive real productivity gains.

> http://lnxjr.nl/IBM5factors

Modernizing SAP Environments with Minimum


Risk—a Path to Big Data
Sponsor: SAP | Topic: Big Data
Is the data explosion in today’s world a liability or a competitive advantage for your business? Exploiting massive amounts
of data to make sound business decisions is a business imperative for success and a high priority for many firms. With rapid
advances in x86 processing power and storage, enterprise application and database workloads are increasingly being moved
from UNIX to Linux as part of IT modernization efforts. Modernizing application environments has numerous TCO and ROI
benefits but the transformation needs to be managed carefully and performed with minimal downtime. Join this webinar to
hear from top IDC analyst, Richard Villars, about the path you can start taking now to enable your organization to get the
benefits of turning data into actionable insights with exciting x86 technology.

> http://lnxjr.nl/modsap

WHITE PAPERS
White Paper: JBoss Enterprise Application
Platform for OpenShift Enterprise
Sponsor: DLT Solutions
Red Hat’s® JBoss Enterprise Application Platform for OpenShift Enterprise offering provides IT organizations with a simple and
straightforward way to deploy and manage Java applications. This optional OpenShift Enterprise component further extends
the developer and manageability benefits inherent in JBoss Enterprise Application Platform for on-premise cloud environments.

Unlike other multi-product offerings, this is not a bundling of two separate products. JBoss Enterprise Middleware has been
hosted on the OpenShift public offering for more than 18 months. And many capabilities and features of JBoss Enterprise
Application Platform 6 and JBoss Developer Studio 5 (which is also included in this offering) are based upon that experience.

This real-world understanding of how application servers operate and function in cloud environments is now available in this
single on-premise offering, JBoss Enterprise Application Platform for OpenShift Enterprise, for enterprises looking for cloud
benefits within their own datacenters.

> http://lnxjr.nl/jbossapp

94 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 94 10/24/14 2:15 PM


KNOWLEDGE HUB

WHITE PAPERS
Linux Management with Red Hat Satellite:
Measuring Business Impact and ROI
Sponsor: Red Hat | Topic: Linux Management

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to de-
ploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT
organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility
workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows
in importance in terms of value to the business, managing Linux environments to high standards of service quality —
availability, security, and performance — becomes an essential requirement for business success.

> http://lnxjr.nl/RHS-ROI

Standardized Operating Environments


for IT Efficiency
Sponsor: Red Hat
The Red Hat® Standard Operating Environment SOE helps you define, deploy, and maintain Red Hat Enterprise Linux®
and third-party applications as an SOE. The SOE is fully aligned with your requirements as an effective and managed
process, and fully integrated with your IT environment and processes.

Benefits of an SOE:

SOE is a specification for a tested, standard selection of computer hardware, software, and their configuration for use
on computers within an organization. The modular nature of the Red Hat SOE lets you select the most appropriate
solutions to address your business' IT needs.

SOE leads to:

s $RAMATICALLY REDUCED DEPLOYMENT TIME

s 3OFTWARE DEPLOYED AND CONFIGURED IN A STANDARDIZED MANNER

s 3IMPLIFIED MAINTENANCE DUE TO STANDARDIZATION

s )NCREASED STABILITY AND REDUCED SUPPORT AND MANAGEMENT COSTS

s 4HERE ARE MANY BENEFITS TO HAVING AN 3/% WITHIN LARGER ENVIRONMENTS SUCH AS

s ,ESS TOTAL COST OF OWNERSHIP 4#/ FOR THE )4 ENVIRONMENT

s -ORE EFFECTIVE SUPPORT

s &ASTER DEPLOYMENT TIMES

s 3TANDARDIZATION

> http://lnxjr.nl/RH-SOE

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 95

LJ247-Nov2014bu.indd 95 10/24/14 2:15 PM


INDEPTH
Rethinking the
System Monitor
vtop is a graphical activity monitor for the command line.
In this article, I take you through how I wrote the app, how
it works underneath and invite you to help extend it.
JAMES HALL

Figure 1. vtop Running on Ubuntu

96 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 96 10/24/14 2:15 PM


INDEPTH

System monitoring tools Background


have been with us since the early For many, the top command has been
days of computing, but on the a key way to monitor rogue processes
terminal, many people still use on *nix systems. William LeFebvre
the top command. Now, let me wrote the original top command more
introduce you to my open-source than 30 years ago on a Vax running
activity monitor called vtop. It uses BSD UNIX. He was inspired by the Vax
Unicode Braille characters for richer VMS operating system that listed the
visualization on the command line. most CPU-hungry processes along

Figure 2. A Flurry of Early Commits

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 97

LJ247-Nov2014bu.indd 97 10/24/14 2:15 PM


INDEPTH

The original vtop was a quick hack, mostly


written in a day, and like all the best open-source
software, it scratched an itch.

with an ASCII bar chart. The bar chart I started hashing out the initial
didn’t make it across into his version; version, not worrying too much about
the author went instead for a text- the tidiness of the code (I was trying
based approach to displaying data to debug a problem quickly after all).
that has stuck with us. I ended up getting carried away with
While the GUI-world enjoys it, and I almost forgot to go back and
increasingly feature-rich tools, debug my original issue.
terminal applications sadly I ran the code on the remote
have lagged behind. Graphical server and was delighted at how
representations in system monitoring immediately useful it was, even in its
tools are nothing new. KSysguard and crude and ugly form. I committed the
GNOME’s System Monitor sport fancy code and showed it to my colleagues
graphs and charts, but this isn’t much at work. The reaction was a mixture
use to us on the command line. of delight (“How do you even do
Although there’s absolutely nothing that?”) and horror (at my sloppy
wrong with top’s text-based approach, programming <blush>), but I knew
it’s not what I needed when I set this idea had legs.
out to write vtop. The original vtop
was a quick hack, mostly written in Write One to Throw Away
a day, and like all the best open- Worrying too much about the
source software, it scratched an architecture early can be a waste
itch. I needed to see CPU spikes to of time. It’s usually best to write
debug some strange behaviour, and one to throw away, and this code
I couldn’t use the graphical tools for base certainly needed binning. The
Linux, because I didn’t want to install best structure for the application
all that bloat on my servers. Just was far more obvious once I had a
looking at the numbers in top doesn’t working prototype.
give you much of an idea of how it’s I sketched out what I thought it
fluctuating over time. should look like: a large area at the

98 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 98 10/24/14 2:15 PM


INDEPTH

top for CPU usage, then two smaller complexities of drawing a GUI in
boxes for memory and a process the terminal. You tell it where to
list. I started a new project and got draw boxes, and they are resized
to work. automatically based on the terminal
I decided to write vtop using width and height. You also can listen
Node.js. It’s built on Chrome’s V8 to scroll wheel and click events to
JavaScript engine and allows you to enable even easier interaction. I highly
write fast and scalable applications. recommend checking it out.
This choice could pave the way I created a couple boxes in Blessed
for a Web-based front end to be and populated the text content of the
added in the future. JavaScript is first one with the Braille characters.
coming into its own—it’s no longer Then I easily was able to add different
the scrappy, badly implemented colors to the app.
language that everyone used to
make sparkles follow their cursors Design Goals
on their Geocities pages. Node.js The rewrite forced me to think about
has evolved the language—it’s now my design goals for the project. I
a fully formed toolchain with was keen to have other developers
a thriving community. There’s get involved, and hopefully, it can be
a Node package for just about used for purposes I never imagined.
anything you can think of; you The design goals can be distilled to
really can hit the ground running by these three:
picking up other people’s modules
instead of writing from scratch. 1. Extendible: plugins should be easy
At the beginning of the rewrite, and quick to write, with clear
I made an outline using simple box separation of UI code and data
drawing characters that I used to collection code. (There’s still a little
love playing with in my early DOS work to do in this area.)
programming days. Although this
worked okay, I felt there might be 2. Accessible: when it comes to
an easier way. I’d seen ncurses and servers, the terminal rules the
wondered if there was anything more roost, and nothing beats the
modern kicking about. I eventually convenience of being able to dive
came across Blessed. straight in over SSH and fire up
Blessed abstracts away the a command. That’s not to say

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 99

LJ247-Nov2014bu.indd 99 10/24/14 2:15 PM


INDEPTH

that a Web-based GUI would be sequence of escape codes printed


unwelcome, but each feature out as text) and mouse support
should work from the command (which is your terminal sending
line too. text escape codes). Pretty much
all terminal emulators support
3. Visual: it should take advantage Unicode now, and in vtop, we use
of the latest and greatest this to our advantage.
techniques—a visually appealing Unicode Braille characters give
interface using color and Unicode you a convenient 8x2 grid of dots in
characters to great effect. every possible combination, starting
at Unicode point 0x2800. We can
Braille Display use these as faux-pixels. You take
Terminals have come a long way a grid of coordinates, and break it
since the early days. xterm added up into chunks for each character,
256-color support (which is just a and then just output them to the
screen like you would any other text.
There are 256 combinations (two
states—on and off for each of the
eight dots, which is 2 8 ), and you
can calculate which character you
need by combining the hexadecimal
numbers for each Braille dot and
adding that to the starting point.
Below are Braille Characters
Representing a Slope on a Graph:

.
..
.. .
.. ..

See http://jsfiddle.net/MrRio/
90vdrs01/3/.
Figure 3. Hexadecimal Values for Each For example, the first character
Braille Dot (Public Domain) above would be 0x1 + 0x2 + 0x4 +

100 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 100 10/24/14 2:15 PM


INDEPTH

0x40 + 0x10 + 0x20 + 0x80 = 0xF7, a true overall value of the CPU
then add this to the base of 0x2800 percentage it’s taking up. It’s also
to get 0x28F7. You can try this in great for monitoring Web servers
your browser’s JavaScript panel: like Apache and nginx.

String.fromCharCode(0x1 + 0x2 + 0x4 + 0x40 + 0x10 Q Killing processes: simply type dd


´+ 0x20 + 0x80 + 0x2800); to make a process die. This is
also the vim shortcut for deleting
There’s a brilliant Node.js library a line.
that abstracts away this detail for you
called node-drawille. It allows you to Q Sorting by CPU or memory: typing
plot onto a canvas and returns Braille c will sort the list by CPU; no
characters using the same method prizes for guessing which key you
described here. press to sort by memory.

Other Features Installation


The main feature is the graphical Simply install npm with your
interface, but it has a few other favourite package manager. Then
tricks up its sleeve: to install the command globally,
just type:
Q Vim-like keybindings: if you use
vim, your muscle memory is tied npm -g install vtop
to its keyboard shortcuts. Type
j/k to move up and down the list Upgrade notifications appear
and h/l to change the scale on within the application, and it can be
the graphs. The arrow keys work upgraded with a single key stroke.
fine too!
Contributing
Q Grouped processes: vtop will Getting Started with the Codebase:
group together processes with the First off, start by forking the project
same name. Many applications on GitHub: https://github.com/
are multiprocess—for example, MrRio/vtop.
Google Chrome spawns a new One you’ve got your own fork, you
process for each tab to increase can clone the source from GitHub
stability and security. You can get (make sure to replace “MrRio” with

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 101

LJ247-Nov2014bu.indd 101 10/24/14 2:15 PM


INDEPTH

your own GitHub user name): drawing the title bar, with the
time and any update notifications.
git clone git@github.com:MrRio/vtop.git
cd vtop Q drawFooter prints all the available
make commands across the footer and a
./bin/vtop.js link to the Web site.

The last command runs your Q drawChart is responsible for


development version of vtop rather drawing Braille charts, and
than the globally installed one. drawTable for the process list,
Now you can begin hacking with although this could do with
the code. refactoring into new files to allow
To give you an idea of where to for more display options to
start, let me guide you through the be contributed.
main areas of the application. The
entry point for the application is Sensors are loaded in from the
bin/vtop.js. This is a hybrid JS file sensors/ folder and polled at the
and shell executable. It first runs as desired frequency. Then the draw
a shell script, detects the name of methods take this data and push it
the node executable (which differs on to the screen.
depending on the platform), enables Themes: A theme is a simple
xterm-256color and then runs itself JSON file containing foreground
as JavaScript. It then includes the and background colors for each
main app.js file in the root. element. Simply bob your theme
Then the app.js file loads in into the themes/ directory, and then
the required libraries, the most run vtop -theme yourtheme . Send
important of which are Drawille for a Pull Request, and as long as it
the Braille output, Blessed for the isn’t too similar to another theme,
GUI and commander, which is used we’ll include it.
to parse command-line options. It The themes files are broken up
then globs the themes/ directory for per component and handed straight
a list of themes and loads itself up over to Blessed’s style parameter
via the init() function. for each component. It’s possible
to change the characters used
Q drawHeader is responsible for for the box lines, or even add

102 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 102 10/24/14 2:15 PM


INDEPTH

Sensors may need extending with more


properties and methods depending on the
kinds of things people want to build with them.

bold and underline (check out },


the Blessed documentation at "border": {
https://github.com/chjj/blessed "type": "line",
for more information): "fg": "#56a0d1"
}
{ },
"name": "Brew", "footer": {
"author": "James Hall", "fg": "fg"
"title": { }
"fg": "#187dc1" }
},
"chart": { Sensors: vtop currently has three
"fg": "#187dc1", sensors, CPU, Memory and Process
"border": { List. A sensor has a title, a type (which
"type": "line", decides the kind of renderer to use),
"fg": "#56a0d1" a polling frequency with a function
} and a currentValue. The sensors know
}, nothing about the UI, and their sole
"table": { job is to output a single number or a
"fg": "fg", list for the table type. vtop then takes
"items": { this information and plots it out.
"selected": { Sensors may need extending with more
"bg": "#56a0d1", properties and methods depending on
"fg": "bg" the kinds of things people want to build
}, with them. For example, an Apache req/s
"item": { sensor may need to be able to report its
"fg": "fg", largest value, so vtop can adjust the scale,
"bg": "bg" or the memory sensor could be extended
} to report multiple values for used,

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 103

LJ247-Nov2014bu.indd 103 10/24/14 2:15 PM


INDEPTH

buffered, cached and free memory. os.cpuUsage(function(v){

The following is an example sensor plugin.currentValue = (Math.floor(v * 100));

file—as you can see, they’re pretty plugin.initialized = true;

straightforward to write. Why not try });

modifying the file to have it report }

something else: };

module.exports = exports = plugin;

/**

* CPU Usage sensor If you have a basic understanding of


* JS, you can see how simple building
* (c) 2014 James Hall a sensor really is. If you can give vtop
*/ a number, it can plot it. You could
var os = require('os-utils'); get these from existing npm modules
var plugin = { or by parsing output of other Linux
/** command-line utilities.
* This appears in the title of the graph

*/ Submitting a Pull Request


title: 'CPU Usage', There are many tutorials on the Internet
/** for getting started with Git (the
* The type of sensor http://git-scm.com Web site is good).
* @type {String} It’s much less scary than you think.
*/ For features, simply make a branch
type: 'chart', called “feature/name-of-feature” and
/** for bugfixes, “bugfix/name-of-fix”.
* The default interval time in ms that this plugin Don’t worry about getting it perfect
* should be polled. More costly benchmarks should first time. Send your code in early for
* be polled less frequently. feedback, and people will help you
*/ refine it and get the code into the
interval: 200, master branch.
initialized: false, I look forward to seeing what you
currentValue: 0, come up with!
/**

* Grab the current value, from 0-100 Other Monitoring Software


*/ There’s more than one way to skin a
poll: function() { cat, and this is especially true on Linux.

104 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 104 10/24/14 2:15 PM


INDEPTH

I’ve rounded up a few of my favorite iotop: This is a great tool for


monitoring tools outside the usual top measuring applications that are
command. Some of these tools even may hammering your Input/Output. It
be easily integrated into vtop as sensors. calculates the number of bytes used.
htop: This is a feature-rich It’s written in Python and parses
interactive process viewer and has information out of /proc/vmstat.
been around for years. The author netstat: This ships as part of Linux
tweeted me to ask if he could use and Windows, and it allows you to see
the Braille graphing idea. I’m very all open connections. It’s often useful
excited to see how this develops to pipe this command into more :
(https://twitter.com/hisham_hm/
status/477618055452037120). netstat | more

Figure 4. The htop Interactive Process Viewer

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 105

LJ247-Nov2014bu.indd 105 10/24/14 2:15 PM


INDEPTH

apachetop: This parses Apache process straightaway.


(and Apache-compatible) log files on Slurm: This tool helps you
the fly to give you real-time requests, visualize network activity on your
per-second stats, most popular system by plotting red and green
pages and more. It’s very handy for “x” characters.
monitoring AJAX and other Web
requests that aren’t tracked in your The Future
favourite Web-based analytics. It’s time to think more about how
NetHogs: This a great tool to see our computers can represent data
where all your Internet bandwidth is over time, and how we can use tools
going. It lists each hog individually that are more visual than top. What
by KB/sec. It doesn’t require you to do you want from a system monitor?
load any special kernel modules— Do you need to see what’s going on
just fire it up and find the offending inside an app? Do you need to see

Figure 5. Slurm

106 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 106 10/24/14 2:15 PM


INDEPTH

Figure 6. How can you help build vtop?

the number of Web server requests, Roll up your sleeves, and let’s make
the temperature of sensors or the something cool! Q
throughput of a database server?
What other visualizations could be James Hall is the author of the popular jsPDF library and also
done with Braille or other characters? founder of a digital agency in UK called Parallax (http://parall.ax).

Resources
vtop: http://parall.ax/vtop

vtop GitHub Repository: https://github.com/MrRio/vtop

Blessed: https://github.com/chjj/blessed

Node-drawille: https://github.com/madbence/node-drawille

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 107

LJ247-Nov2014bu.indd 107 10/24/14 2:15 PM


EOF
Big Bad Data DOC SEARLS

Obsession with Big Data has gotten out of hand. Here’s how.

I
’m writing this on September 11, Like its successors, such as
2014, 13 years after the famous PRISM (http://en.wikipedia.org/
day when terrorist hijackers wiki/PRISM_%28surveillance_
flew planes into buildings, killing program%29), Trailblazer was
thousands and changing the world all about collecting everything it
for the worse. I also spent the last could from everywhere it could.
three days getting hang time with “At least 80% of all audio calls,
Bill Binney (http://en.wikipedia.org/ not just metadata”, Bill tells us
wiki/William_Binney_%28U.S._ (http://www.theguardian.com/
intelligence_official%29), who commentisfree/2014/jul/11/the-
says the 9/11 attacks could have ultimate-goal-of-the-nsa-is-total-
been prevented. Bill makes this population-control), “are recorded
claim because he led an NSA project and stored in the US. The NSA lies
designed to find clues and put them about what it stores.” At the very
together. It was called ThinThread least, revelations by Bill and other
(http://en.wikipedia.org/wiki/ sources (such as Edward Snowden
ThinThread). The NSA discontinued and Chelsea Manning) make it clear
ThinThread three weeks before the that the Fourth Amendment
attacks, opting eventually to go with (https://en.wikipedia.org/wiki/
another project called Trailblazer Probable_cause) no longer protects
(http://en.wikipedia.org/wiki/ American citizens from unreasonable
Trailblazer_Project#Background). searches and seizures. In the era of
Bill says ThinThread would have Big Data everywhere, it’s reasonable
cost $9 million to deploy. Trailblazer to grab all of it.
ended up costing hundreds of Surveillance also has a chilling
millions of dollars and sucked effect on what we say. Talk about
(https://en.wikipedia.org/wiki/ ________ and the Feds might flag
Trailblazer_Project#Whistleblowing). you as a ________. Among other

108 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 108 10/24/14 2:15 PM


EOF

As a reader, you’re probably already on some NSA


list. I’d say “be careful”, but it’s too late.

things, Edward Snowden and Glenn of the Electronic Frontier Foundation’s


Greenwald (https://en.wikipedia.org/ suit (https://www.eff.org) against the
wiki/Glenn_Greenwald) revealed NSA (Jewel v. NSA, https://www.eff.org/
that Linux Journal has been placed cases/jewel), Bill said this about the
(http://www.linuxjournal.com/ size of the agency’s data processing
content/nsa-linux-journal-extremist- and storage plans:
forum-and-its-readers-get-flagged-
extra-surveillance) under suspicion The sheer size of that capacity
(http://www.linuxjournal.com/ indicates that the NSA is not
content/stuff-matters) by an filtering personal electronic
NSA program called XKeyscore communications such as email
(https://en.wikipedia.org/wiki/ before storage but is, in fact,
XKeyscore). As a reader, you’re storing all that they are collecting.
probably already on some NSA list. The capacity of NSA’s planned
I’d say “be careful”, but it’s too late. infrastructure far exceeds the
The differences between ThinThread capacity necessary for the
and what the NSA now does are storage of discreet, targeted
ones of method and discretion. communications or even for the
ThinThread’s method was to watch storage of the routing information
for suspect communications in real from all electronic communications.
time on international data pipes, and The capacity of NSA’s planned
to augment or automate the work of infrastructure is consistent, as
human analysts whose job was finding a mathematical matter, with
bad actors doing bad things while also seizing both the routing
protecting people’s rights to privacy. information and the contents
The scope of data collected by the of all electronic communications.
NSA since then has veered toward
the absolute. In sworn testimony So the NSA has been into Big Data
(https://publicintelligence.net/ since at least a decade before the term
binney-nsa-declaration), in support came into common use (Figure 1).

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 109

LJ247-Nov2014bu.indd 109 10/24/14 2:15 PM


EOF

Figure 1. Big Data Trends (Source: Google Trends, September 11, 2014)

The year 2011 was, not been obvious. On the science side,
coincidentally, when McKinsey that imperative surely helped sway
(http://www.mckinsey.com/insights/ the NSA toward Trailblazer and PRISM
business_technology/big_data_the_ and away from ThinThread, which was
next_frontier_for_innovation) and about doing more with less. But now
Big Tech Vendors began driving the the Big Data meme is hitting a plateau,
demand for Big Data solutions with as you can see in the graph in Figure
aggressive marketing of the meme. 1. There is also a backlash against
The pitch went like this: the world it (http://www.economist.com/
is turning into data, in quantities blogs/economist-explains/2014/04/
exploding at an exponential rate. It is economist-explains-10), given
essential to get in front of that wave the degree to which we also are
and take advantage of it, or to risk surveilled by marketers. In “How
drowning in it. With Big Data, you Big Data is Like Big Tobacco—Part
can “unlock value”, “gain insights”, 1” (http://www.forbes.com/sites/
“improve performance”, “improve sap/2014/08/26/how-big-data-is-
research”, “segment marketing and like-big-tobacco-part-1), Tim Walsh,
services”, “improve decision-making”. SAP’s Global Vice President, Customer
And, of course, “save lives”. Engagement and Commerce, writes
Lots of the pitching talked about this for Forbes:
science and health, where the
advantages of more data always have Big Data is running down a similar

110 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 110 10/24/14 2:15 PM


EOF

path. Deception? Check. Users are carries your purchase history, voter
only now realizing on a broad basis registration, residence, major credit
that many companies are watching, events, network of friends, etc.
recording and manipulating them That list is growing exponentially
constantly. It’s not just what you because now the cottage data
buy. That’s primitive stuff. Every industry has become Big Data, with
site you visit, everything you limitless resources. Increasingly,
“like”, every person you interact Big Data isn’t even bothering to
with online, every word you type in ask user consent for any of this.
“free” email or chat service, every As they say: “Not paying for the
picture you take (yes, including product? You are the product.” The
those you thought were instantly government (US and EU) is taking
deleted), every physical place notice and taking action. Users feel
you go with that mobile device, deceived and governments have
the middle of the night drunken picked up the scent.
surfing—yes, yes and yes.
In “Eight (No, Nine!) Problems With
And it’s not just online activity. Big Data” in The New York Times
Remember, companies have been (http://www.nytimes.com/2014/04/07/
at this for decades. All the publicly opinion/eight-no-nine-problems-
available information is now being with-big-data.html?_r=1), Gary
tied together with your digital life Marcus and Ernest Davis lay out
to deliver an incredibly intimate more issues:
picture of who you are and what
you are likely to want, spend, do. 1. “...although big data is very
Just leave it to Big Data to make the good at detecting correlations,
predictions. (What’s the best way especially subtle correlations that
to make an accurate prediction? an analysis of smaller data sets
Manipulate the outcome!) might miss, it never tells us which
correlations are meaningful.”
Anyone not living in a gun shack
has a profile that runs to literally 2. “...big data can work well
thousands of data elements. You as an adjunct to scientific
don’t need to be a Facebook addict inquiry but rarely succeeds as
to have a file 6 inches thick that a wholesale replacement.”

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 111

LJ247-Nov2014bu.indd 111 10/24/14 2:15 PM


EOF

We are also barely revealed by the junk


that marketing surveillance systems pick
up when they follow us around with cookies,
tracking beacons and other intrusive and
unwelcome things.

3. “...many tools that are based on Another problem: it tends not to


big data can be easily gamed.” work. In “Where Big Data Fails...
and Why” (http://blog.primal.com/
4. “...even when the results of a where-big-data-failsand-why), Peter
big data analysis aren’t Sweeney explains how increasing
intentionally gamed, they often the size of the data and complexity
turn out to be less robust than of the schema (expressiveness and
they initially seem.” diversity of knowledge) results in poor
price/performance toward achieving
5. “...the echo-chamber effect, which marketing’s holy grail of “personalized
also stems from the fact that much media”. His bottom line: “These
of big data comes from the web.” analytical approaches inevitably
break down when confronted with
6. “...the risk of too many the small data problems of our
correlations.” increasingly complex and fragmented
domains of knowledge.”
7. “...big data is prone to giving There is nothing more complex and
scientific-sounding solutions to fragmented than a human being—
hopelessly imprecise questions.” especially if you’re a robot who wants
to get personal with one. Each of us
8. “...big data is at its best when not only differs from everybody else,
analyzing things that are but from ourselves, from one moment
extremely common, but often to the next. So, while big data works
falls short when analyzing things well for making generalizations
that are less common.” about populations of people, at the
individual level it tends to fail. We
9. “...the hype.” are also barely revealed by the junk

112 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 112 10/24/14 2:15 PM


EOF

that marketing surveillance systems PageFair (https://pagefair.com) and


pick up when they follow us around Adobe. Here are some results, verbatim:
with cookies, tracking beacons and
other intrusive and unwelcome Q “In Q2 2014 there were
things. Here’s how Peter Sweeney approximately 144 million monthly
lays it out, verbatim: active adblock users globally (4.9%
of all internet users); a number
Q “The individual interests and which has increased 69% over the
preferences of end-users are previous 12 months.”
only partially represented in
the media.” Q “Google Chrome is bringing ad
blocking to the masses and seeing
Q “Individual user profiles the largest increase of adblockers,
and activity do not provide up by 96% to approximately
sufficient data for modeling 86 million monthly active users
specific interests.” between Q2 2013 and Q2 2014.”

Q “Market participants do not Q “Share of ads blocked by ’end-user


produce sufficient data about installed’ browsers is 4.7x higher
individual products and services.” than by ’pre-installed’ browsers.”

Q “Media and messaging are only Q “Adblock adoption is happening


a shadow of the interests of end- all over the world—Poland,
users; direct evidence of end-user Sweden, Denmark, and Greece
interests is relatively sparse.” are leading the way with an
average of 24% of their online
This is why the popularity of ad populations using adblocking
blockers (most of which also block software in Q2 2014.”
tracking) are high, and growing
rapidly. This is the clear message of Q “Countries like Japan, Spain,
“Adblocking Goes Mainstream” China and Italy are catching up;
(http://downloads.pagefair.com/ with their percentage of online
reports/adblocking_goes_ populations that use adblock
mainstream_2014_report.pdf), plug-ins growing as much as
published on September 9, 2014, by 134% over the last 12 months.”

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 113

LJ247-Nov2014bu.indd 113 10/24/14 2:15 PM


EOF

Figure 2. Privacy Extensions

114 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 114 10/24/14 2:15 PM


This is the market talking. So is
what’s shown in Figure 2. Advertiser
Figure 2 shows all the extensions
for ad and tracking blocking I’ve
Index
added in Firefox. Thank you as always for supporting our
I may be an extreme case (my advertisers by buying their products!
interest in this stuff is professional,
so I check everything out), but
few of us like being spied on, or ADVERTISER URL PAGE #

what being spied on does to us— Drupalize.me http://www.drupalize.me 23


whether it’s biting our tongues or
leading us to reject the very thing EmperorLinux http://www.emperorlinux.com 41

that pays for the free goods we


enjoy on the Web. New Relic http://www.newrelic.com 3, 17

There are legal and policy solutions


ServerBeach http://serverbeach.com 59
to the problem of government
surveillance. On the legal front Silicon Mechanics http://www.siliconmechanics.com 7

we have the EFF and others, filing


suits against the government
(https://www.eff.org/nsa-spying)
and making clear arguments on the ATTENTION ADVERTISERS
open Web. On the policy front we have
our votes, plus the combined efforts The Linux Journal brand’s following has
grown to a monthly readership nearly
of the EFF, StandAgasinstSpying
one million strong. Encompassing the
(https://standagainstspying.org),
magazine, Web site, newsletters and
DemandProgress
much more, Linux Journal offers the
(http://demandprogress.org/
ideal content environment to help you
campaigns), Sunlight Foundation reach your marketing objectives. For
(http://sunlightfoundation.com) more information, please visit
and others. http://www.linuxjournal.com/advertising.
On the business side, we have the
clear message that ad and tracking
blocking sends, plus the high cost of
Big Data-based surveillance—which
at some point will start making an

WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 115

LJ247-Nov2014bu.indd 115 10/28/14 2:48 PM


EOF

ROI argument against itself. My what it does, brand advertising is


own favorite argument against a great supporter. (It supports a lot
surveillance-based advertising is of crap too, but that’s beside the
the one for old-fashioned brand point here.) On the other hand,
advertising. This is what Don Marti surveillance-driven personalized
(our former Editor-in-Chief, advertising supports replacing
http://zgp.org/%7Edmarti) has journalism with click-bait.
been doing lately. For example Don has a simple solution:
(http://zgp.org/%7Edmarti/business/
monkey-badger/#.VBoCi0u0bwI): So let’s re-introduce the Web to
advertising, only this time, let’s
Your choice to protect your privacy try it without the creepy stuff
by blocking those creepy targeted ( http://zgp.org/targeted-
ads that everyone hates is not a advertising-considered-
selfish one. You’re helping to harmful/#what-next-solutions).
re-shape the economy. You’re Brand advertisers and web content
helping to move ad spending away people have a lot more in common
from ads that target you, and than either one has with database
have more negative externalities, marketing. There are a lot of great
and towards ads that are tied opportunities on the post-creepy
to content, and have more web, but the first step is to get the
positive externalities. right people talking.

The most positive externality, for So, if you advertise something


us here at Linux Journal—and for Linux-y, call our sales department. Q
journalism in general—is journalism
itself. Brand advertising isn’t Doc Searls is Senior Editor of Linux Journal. He is also a fellow
personal. It’s data-driven only so far with the Berkman Center for Internet and Society at Harvard
as it needs to refine its aim toward University and the Center for Information Technology and
populations. For example, people Society at UC Santa Barbara.
who dig Linux. Brand advertising
supports editorial content in a nice
clean way: by endorsing it and Send comments or feedback via
associating with it. http://www.linuxjournal.com/contact
By endorsing journalism for exactly or to ljeditor@linuxjournal.com.

116 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM

LJ247-Nov2014bu.indd 116 10/24/14 2:15 PM

You might also like