Professional Documents
Culture Documents
MongoDBandNoSQL
Systems
Overview
Inrecentyearsalternativestorelationaldatabaseshaveemergedthatprovideadvantagesin
termsofperformance,scalability,andsuitabilityforcloudenvironments.Whiletheyvary
significantlyintermsofcapabilities,manyintheindustryhaveadoptedthetermNoSQLto
describetheseproducts.Inthispaperweevaluatethescalabilityofthethreeleadingproducts
Cassandra,Couchbase,andMongoDBusinganindustrystandardbenchmarkcreatedby
Yahoo!calledYCSB.
Selectingtheappropriatedatabasetechnologyforaprojectinvolvescarefulconsiderationof
manydifferentcriteria.Whileperformanceisimportant,itmustbeconsideredalongwith
functionality,operationaltooling,easeofuse,availabilityofskills,technologyecosystem,
securitycontrols,reliability,andotherfactors.Ourgoalinthispaperistotakeacloserlookat
oneofthesecriteriascalabilityandweencouragereaderstouseourfindingsasoneof
manyinputstotheirownevaluations.
Methodology
Thethreetechnologiesweevaluateinthisreportprovideverydifferentfeaturesets.
Comparisonsaredifficult,buteachproductprovidesasimple,coresetofcapabilitiesthatcan
beusedasthebasisofascalabilityevaluation.
In2010,Yahoo!publishedabenchmarkcalledtheYahoo!CloudServingBenchmark(YCSB).
ThisbenchmarktestsCRUDoperations.ManyorganizationshaveusedYCSBtoevaluate
databasesanddifferenthardwareenvironments.Ithasbecomepopularandwellunderstood.
Weincorporatedenhancementsprovidedbyeachvendorfortheirclientintoourteststo
preventtheforkofanysinglevendorfromdisadvantagingtheotherproducts.Wealsoused
thelatestversionofeachdatabaseanddriverstoensurethatthebestpossibleperformance
foreachproductwasobserved.
YCSBisaframeworkandcommonsetofworkloadsforevaluatingtheperformanceof
differentdatabases.YCSBconsistsof:
TheClientanextensibleworkloadgenerator
TheCoreWorkloadsasetofworkloadscenariostobeexecutedbythegenerator
ThefollowingdiagramillustrateshowYCSBworkswithadatabase:
WeusedYCSBasthebasisforallourtests.Weusedthesamenumberofrecords,record
size,numberoffields,andnumberofoperationsinallofourteststoprovideacommon
baseline.Foreachofthetests,weperformedmultipleindependentruns,increasingthe
numberofclientthreadsuntilthemaximumthroughputwasreachedwithoutsacrificing
latency.Forallthreeproductstheidealnumberofthreadswasaround150forworkloadA,
andaround350forworkloadB.Formeasuringresults,werecordedthroughputandlatencies
forreadsandupdates(average,95thpercentile,and99thpercentile).
OursetupconsistedofadatabaseclusterofthreeserversandonededicatedYCSBserverto
ensuretheYCSBclientwasnotcompetingwiththedatabaseforresources.Allserverswere
identical.
Weperformedseparaterunsforeachofthethreeconfigurationswetested,eachoftheYCSB
workloads(WorkloadA,WorkloadB),andeachofthethreadcountswetested.
Foreachdatabaseweperformedthefollowingtaskstoloadthedata:
Startwithanemptydatabaseinstance(datafileswiped,serverrestarted)
Load400MrecordsusingtheloadphaseofYCSB
Allowthedatabasetostabilizeaftertheload,includingcompaction
Wethenraneachworkloadandrecordedtheresults.Beforestartingthenextworkloadwe
allowedthedatabasetostabilize.
2
Deployment Architecture
Allthreedatabasesimplementdistributedarchitectures.Whiletheirdesignsfollowvery
differentapproaches,allthreedatabasesprovideautomaticpartitioningofdatatosupport
scaleout,andallthreedatabasesrelyonasynchronousreplicationtomaintainmultiple
copiesofthedataforhighavailability.Furthermore,allthreedatabasesrecommend
multiserverenvironmentsforproductiondeployments.
Inthesetestswedeployedthedatabasesacrossthreebaremetalservers.Datawas
partitionedacrossallthreeservers,andeachdatabasewasconfiguredtomaintaintwocopies
ofthedata.
Test Results
WeevaluatedtwoworkloadsusingYCSB:WorkloadA,whichconsistsofequalnumbersof
readsandupdates,andWorkloadB,whichconsistsof95%readsand5%updates.Alltests
wereperformedwith400Mrecords.Becauseeachsystemisalsomaintainingtwocopiesof
thedata,eachserverismanaging~267Mrecords,whichrepresentsadatasetlargerthan
RAM.Eachtestperforms100Moperationsandrecordsthroughputandlatenciesatthe95th
and99thpercentilesforreadsandupdatesseparately.
WhentestingadatasetlargerthanRAM,the50/50workloadinthesetestsdemonstratesthat
MongoDBprovidesover1.8xgreaterthroughputthanCassandra,andnearly13xgreater
throughputthanCouchbase.
MongoDBprovidessingledigitmillisecondlatencyatthe99thpercentileforreadsand
updates.Cassandraslatencyisslightlybetterforupdates,butworseforreads,whichistobe
expectedasitisawriteoptimizeddatabase.Weobservedsurprisinglyhighlatencyfor
Couchbase,whichwasseveraltimeshigherthaneitherCassandraorMongoDB:
YCSB(Latencies)WorkloadA
99th(Read)
99th(Update)
Cassandra
14ms
3ms
Couchbase
39ms
63ms
MongoDB
5ms
9ms
WhentestingadatasetlargerthanRAM,thereadheavyworkload(95%reads)shows
MongoDBagainprovidesover1.75xthethroughputofCassandra,andover6xthe
throughputofCouchbase.
InthisworkloadthelatenciesforCassandraandMongoDBarehigherthanthe50/50
workload,andCouchbaseexhibitsthehighestlatenciesoverall:
YCSB(Latencies)WorkloadB
99th(Read)
99th(Update)
Cassandra
16ms
8ms
Couchbase
25ms
53ms
MongoDB
12ms
12ms
Conclusions
Basedonthesetests,wehavereachedthefollowingconclusions:
MongoDBprovidesgreaterperformancethanCassandraandCouchbaseinallthe
testsweran,byasmuchas13x.
CassandraandMongoDBexhibitverylowlatencyatthe99thpercentileforreadsand
updatesinbothworkloads.Couchbaseexhibitssignificantlyhigherlatency.
ThetestsshowthatindeploymentswheredataexceedsthesizeofRAM,wheredata
ispartitionedacrossmultipleservers,andwheredataisreplicatedforhighavailability,
MongoDBprovidesbetterperformancethanCassandraandCouchbase.
Insofarasthesetestsonlyevaluatethemostbasicoperationsofthesedatabases,
usersshouldalsoconsidertheotherfeaturesthatareimportanttotheirapplication,
andtheyshoulddevelopteststhatevaluatethosecapabilitiesaswell.
Environment
Thesetestswereconductedonbaremetalservers.Eachserverhadthefollowing
specification:
CPU:2x
3.0GHzIntelXeon(R)CPUE52690v2DecaCoreIvyBridge
RAM:96GB:6x16GBKingston16GBDDR32Rx4
Disk:(2)960GBSanDiskCloudSpeed1000SSD
drivecontroller:Adaptec71605
OS:Ubuntu14.10
Network:10gigE
Thefollowingconfigurationsweremadetooptimizetheenvironmentforeachdatabase:
readaheadwasreducedto64blocks
transparenthugepages,defrag,numaweredisabled
ThefollowingYCSBcodewasusedforthesetests:
https://github.com/usaindev/YCSB
Versionsofeachdatabase:
Cassandra2.1.4
Couchbase3.0.2
MongoDB3.0.3
Driverversions:
6
Cassandra:CQLcassandradrivercore2.1.5
Couchbase:2.1.2
MongoDB:mongojavadriver3.0.0
Allsettingsweredefaultforallthreedatabases.Thefollowingconfigurationsweremade:
ForCassandra,thecommitlogwasplacedonaseparatedevice
ForMongoDBthejournalwasplacedonaseparatedevice
storageEngine=wiredTiger
zlibcompressionwasenabledforusertablecollection.
YCSBtestconfiguration
400milliondocuments
2copiesofthedata
100Millionoperations
Zipfianrequestdistribution
Insomeconfigurationswhererunning100millionoperationswasprohibitivelyslow,
theworkloadwasrunwithadditionalparametermaxexecutiontime=3600which
limitedtherunto60minutes.