Professional Documents
Culture Documents
F50 IBM RS/6000 donated by IBM, included 4 processors, 512 MB of memory and 8 9 GB hard disk
drives.
Two additional boxes included 3 9 GB hard drives
and 6 x 4 GB hard disk drives respectively (the original storage for Backrub). These were attached to
the Sun Ultra II.
IBM disk expansion box with another 8 9 GB hard
disk drives donated by IBM.
Homemade disk box which contained 10 9 GB
SCSI hard disk drives.
According to Google their global data center operation electrical power ranges between 500 and 681
megawatts.[6][7] The combined processing power of these
servers might have reached from 20 to 100 petaops in
2008.[8]
Hardware
1.1
Original hardware
The original hardware (circa 1998) that was used by 1.3 Network topology
Google when it was located at Stanford University
included:[1]
Details of the Google worldwide private networks are not
publicly available but Google publications[9][10] make ref Sun Microsystems Ultra II with dual 200 MHz pro- erences to the Atlas Top 10 report that ranks Google as
cessors, and 256 MB of RAM. This was the main the third largest ISP behind Level 3.[11]
machine for the original Backrub system.
In order to run such a large network with direct connections to as many ISP as possible at the lowest possible cost
Google has a very open peering policy.[12]
From this site we can see that the Google network can be
accessed from 67 public exchange points and 69 dier1
SOFTWARE
These custom switch-routers are connected to DWDM chose this location due to the availability and proximity
devices to interconnect data centers and point of pres- of renewable energy sources.[26]
ences (PoP) via dark bre.
From a datacenter view, the network starts at the rack
level, where 19-inch racks are custom-made and contain
40 to 80 servers (20 to 40 1U servers on either side,
while new servers are 2U rackmount systems.[14] Each
rack has a switch). Servers are connected via a 1 Gbit/s
Ethernet link to the top of rack switch (TOR). TOR
switches are then connected to a gigabit cluster switch using multiple gigabit or ten gigabit uplinks.[15] The cluster
switches themselves are interconnected and form the datacenter interconnect fabric (most likely using a dragony
design rather than a classic buttery or attened buttery
layout[16] ).
2 Software
Most of the software stack that Google uses on their
servers was developed in-house.[29] According to a
well known Google employee, C++, Java, Python and
(more recently) Go are favored over other programming
languages.[30] For example, the back end of Gmail is written in Java and the back end of Google Search is written in C++.[31] Google has acknowledged that Python has
played an important role from the beginning, and that it
continues to do so as the system grows and evolves.[32]
From an operation standpoint, when a client computer attempts to connect to Google, several DNS servers resolve
www.google.com into multiple IP addresses via Round
Robin policy. Furthermore, this acts as the rst level of
load balancing and directs the client to dierent Google
clusters. A Google cluster has thousands of servers and
once the client has connected to the server additional load
balancing is done to send the queries to the least loaded The software that runs the Google infrastructure
[33]
web server. This makes Google one of the largest and includes:
most complex content delivery networks.[17]
Google Web Server (GWS) custom Linux-based
Google has numerous data centers scattered around
Web server that Google uses for its online services.
the world. At least 12 signicant Google data center installations are located in the United States. The
Storage systems:
largest known centers are located in The Dalles, Ore Google File System and its successor,
gon; Atlanta, Georgia; Reston, Virginia; Lenoir, North
[18]
Colossus[34][35]
Carolina; and Moncks Corner, South Carolina. In Europe, the largest known centers are in Eemshaven and
BigTable structured storage built upon
Groningen in the Netherlands and Mons, Belgium.[18]
GFS/Colossus[34]
Googles Oceania Data Center is claimed to be located
Spanner planet-scale structured storage sysin Sydney, Australia. [19]
tem, next generation of BigTable stack[34][36]
1.4
Project 02
3.2
Server types
Indexing/search systems:
whole index in main memory (although with low replication or no replication at all), and in early 2001 Google
TeraGoogle Googles large search index switched to an in-memory index system. This switch rad(launched in early 2006), designed by Anna ically changed many design parameters of their search
Patterson of Cuil fame.[38]
system, and allowed for a signicant increase in through Caeine (Percolator) continuous indexing put and a large decrease in latency of queries.[46]
system (launched in 2010).[39]
In June 2010, Google rolled out a next-generation in-
Hummingbird major search index update, dexing and serving system called Caeine which can
including complex search and voice search.[40] continuously crawl and update the search index. Previously, Google updated its search index in batches usGoogle has developed several abstractions which it uses ing a series of MapReduce jobs. The index was separated into several layers, some of which were updated
for storing most of its data:[41]
faster than the others, and the main layer wouldn't be updated for as long as two weeks. With Caeine the en Protocol Buers Googles lingua franca for
tire index is updated incrementally on a continuous basis.
data,[42] a binary serialization format which is
Later Google revealed a distributed data processing syswidely used within the company.
tem called Percolator[47] which is said to be the basis of
[39][48]
SSTable (Sorted Strings Table) a persistent, or- Caeine indexing system.
dered, immutable map from keys to values, where
both keys and values are arbitrary byte strings.
It is also used as one of the building blocks of 3.2 Server types
BigTable.[43]
Googles server infrastructure is divided into
RecordIO a sequence of variable sized several types, each assigned to a dierent
records.[41][44][45]
purpose:[14][17][49][50][51]
2.1
Most operations are read-only. When an update is required, queries are redirected to other servers, so as to
simplify consistency issues. Queries are divided into subqueries, where those sub-queries may be sent to dierent
ducts in parallel, thus reducing the latency time.[14]
To lessen the eects of unavoidable hardware failure,
software is designed to be fault tolerant. Thus, when a
system goes down, data is still available on other servers,
which increases reliability.
3
3.1
Search infrastructure
Index
References
REFERENCES
[27] http://www.theregister.co.uk/2009/04/10/google_data_
center_video
Crunch-
[41] Google Developing Caeine Storage System | TechWeekEurope UK. Eweekeurope.co.uk. 2009-08-18.
Retrieved 2012-02-17.
Further reading
L.A. Barroso, J. Dean, and U. Hlzle (MarchApril
2002). Web search for a planet: The Google cluster architecture (PDF). IEEE Micro 23 (2): 2228.
doi:10.1109/MM.2003.1196112.
Shankland, Stephen, CNET news "Google uncloaks
once-secret server. April 1, 2009.
External links
Google Research Publications
Web Search for a Planet: The Google Cluster Architecture (Luiz Andr Barroso, Jerey Dean, Urs
Hlzle)
Underneath the Covers at Google: Current Systems
and Future Directions (Talk given by Je Dean at
Google I/O conference in May 2008)
7.1
Text
Google platform Source: http://en.wikipedia.org/wiki/Google_platform?oldid=666160878 Contributors: AxelBoldt, Edward, Ixfd64, CesarB, Ihcoyc, Ronabop, Bogdangiusca, Nikola Smolenski, Ed g2s, Joy, Tranquileye, JesseW, Mattaschen, Acm, Centrx, Philwiki, Christopher Parham, Bkonrad, Mcapdevila, Bobblewik, Neilc, Joeblakesley, Karol Langner, Joyous!, Ojw, Zondor, Mike Rosoft, Jrp, Discospinster, Smyth, Antaeus Feldspar, ReiVaX, Bender235, ZeroOne, Haxwell, Smalljim, Duk, Cmdrjameson, Pearle, Next2u, Lmviterbo, Demi,
B3virq3b, Samohyl Jan, Velella, Rick Sidwell, Suruena, Omphaloscope, R6MaY89, Jesvane, H2g2bob, Computerjoe, Kusma, KUsam,
RHaworth, Uncle G, Splintax, Zippo, WadeSimMiser, Pchov, Bbatsell, Zzyzx11, TNLNYC, Windchaser, TheDaveRoss, Rune.welsh,
DevastatorIIC, Kickboy, AmritTuladhar, Atratus, Cjdyer, Cyberprog, RussBot, Arado, Amaltsev, Gardar Rurak, Stephenb, Debackerl,
Schoen, Tungsten, Wiki alf, ColdFusion650, Derek.cashman, Iambk, Petri Krohn, JLaTondre, Eaefremov, Sinan Taifour, NeilN, Bask,
Dee00, SmackBot, Ashenai, Reedy, Tarret, Anastrophe, Cacuija, Xaosux, Gilliam, Hmains, Lakshmin, Remohammadi, Tree Biting
Conspiracy, Mnemoc, Metalim, Can't sleep, clown will eat me, Frap, Landon, Masteroftherealm, Adamantios, Radagast83, Mig30m6,
Doodle77, A5b, Ligulembot, Ohconfucius, Eliyak, Platonides, JirkaV, Agencius, Candamir, Mcnumara, Optimale, DavidParrish, Hulkster, DangerousPanda, CmdrObot, Raysonho, Chrisahn, Tex, Mikapell, ShoobyD, Gogo Dodo, Rocket000, Bear475, Joshvf, EdJohnston,
Porqin, AntiVandalBot, Seaphoto, List of marijuana slang terms, Mnp~enwiki, VoABot II, Recurring dreams, Not a dog, Pkrecker, Zsh, InnocuousPseudonym, Ckielstra, Slogsweep, Uncle Dick, Nsigniacorp, Grosscha, WebHamster, KylieTastic, DMCer, Beanman252, Wikieditor06, Orgads, Robo45h, Drestros power, Everything counts, Q Chris, Andy Dingley, Haseo9999, JohnWallaby, Aednichols, Wraithvefa,
Sound-Mind, FBachofner, Flyer22, Belinrahs, LightSpeed2, Gyrferret, Sfan00 IMG, ClueBot, Shaded0, Justin545, Sun Creator, Papa6,
Dsimic, Addbot, Crimsontaco99, TheNeutroniumAlchemist, Sbelza, NailPuppy, Jasper Deng, Jarble, Margin1522, Yobot, TaBOT-zerem,
Bugnot, Richard.e.morton, AnomieBOT, Jim1138, Bluerasberry, Citation bot, Simrider, Visualpiano, Omnipaedista, FrescoBot, W Nowicki, X7q, Chaim Shel, Spindocter123, Arndbergmann, Citation bot 1, Btilm, Sasha Beluj, Full-date unlinking bot, Mai0907, RjwilmsiBot,
Nevknown, John of Reading, GoingBatty, Ludovic.ferre, Chris Kuehl, ClueBot NG, JetBlast, 10k, Juiceman74, Alpha7248, Tkhemani,
Helpful Pixie Bot, Cowthief, Compfreak7, Tony Tan, Ericzqma, Klilidiplomus, Paburr, BattyBot, Wtb435, James Pearn, Alipoor90, Frosty,
Inna Z, Cdwn, Masida, Epicgenius, Drewp, Facker~enwiki, CosmosSoup, Antrocent and Anonymous: 255
7.2
Images
7.3
Content license