Professional Documents
Culture Documents
Johan Andersson
PS Practice Manager, MySQL Cluster johan.andersson@sun.com
Mat Keep
MySQL Cluster Product Management matthew.keep@sun.com Copyright 2009 MySQL Sun Microsystems.
The Worlds Most Popular Open Source Database 1
Agenda
General Design Concepts Schema Optimization Index Selection & Tuning Query & Parameter Tuning Development Tools Resources
About MySQL
14 Years of Development Acquired by Sun in February 2008 400+ in Database Group 750+ Partners 70K+ Downloads Per Day
Customers across every major operating system, hardware vendor, geography, industry, and application type
OEM / ISV's
Telecommunications
Enterprise 2.0
http://www.mysql.com/customers/cluster/
The Worlds Most Popular Open Source Database 5
Overall design goal Minimize network roundtrips for your most important requests!
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 6
Speed Increase
Query Optimization
rewrite slow queries often goes hand in hand with Schema Optimization
Parameter Tuning
Use a good configuration www.severalnines.com/config
Hardware Tuning
get faster CPU/DISK
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 8
Re-run the optimized typical use cases using mysqlslap GOTO BEGIN;
END; Yes, Performance tuning is a never-ending task. Never tune unless you can measure and test Don't optimize unless you have a problem
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 10
INDEX searches are done in O(log n) time. JOINs are ok if you understand what can make them slow.
If your most important requests are 10-way JOINs with huge result sets then Cluster may not be for you.
Or use scale out (write to cluster read from innodb): http://johanandersson.blogspot.com/2009/05/ha-mysql-writescaling-using-cluster-to.html
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 11
Synchronous replication adds ~2.7x for writes compared to reads Test was with 8 threads connecting to one mysqld 'bencher' was used to generate the load.
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 12
Batching
MySQL Cluster allows batching on
Inserts, index scans (when not part of a JOIN), PK reads, PK deletes, and PK (most) updates. Batching means that one network round trip is used to read/modify a number of records less ping-pong! If you can batch - do it!
Batching
Example READ 10 records (get 10 services for a user)
No batching:
10 x SELECT * FROM t1 WHERE userid=1 AND serviceid={ id
mysqlslap --create-schema=test -c8 -i 1000 -q single.txt Average number of seconds to run all queries: 0.006 seconds
Batching
Another way batching on different tables transaction_allow_batching can be used:
SET transaction_allow_batching=1; //must be set on the connection BEGIN; INSERT INTO t1 ....; INSERT INTO t2 ....; INSERT INTO t3 ....; INSERT INTO t4 ....; DELETE FROM t5 ....; UPDATE t1 SET value='new value' WHERE id=1; COMMIT;
All data belonging to a particular userid will be on the same partition. select * from user where userid=1; Only one data node will be scanned (no matter how many nodes you have)
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 16
+-----------------------+-------+
BLOBs/TEXTs vs VARBINARY/VARCHAR
BLOB/TEXT columns are stored in an external hidden table.
First 255B are stored inline in main table Reading a BLOB/TEXT requires two reads
One for reading the Main table + reading from hidden table
USERID 1 2 3 4
USER_SVC_VOIP
USER_SVC_BROADBAND
USER_SVC_VOIP_ BB
Denormalized:
SELECT * FROM USER_SVC_VOIP_BB AS bb_voip WHERE bb_voip=1; Total throughput = 21591.64 tps Average response time=371us
Query Optimization
JOINs are executed in the MySQL server. The OPTIMIZER in MYSQL only knows one algorithm
Nested Loop Join This algorithm is not brilliant in its effectiveness
Author a
AuthorBook b
Query Optimization
SELECT fname, lname, title FROM a,b WHERE b.id=a.id AND a.country='France';
Authid (PK) 1 2 3 4 Frame Albert Sully Johann Junichiro Iname Camus Prudhomme Goethe Tanizaki Country France France Germany Japan Authid (PK) ISBN (pk) 1 1 1 2 1111 1112 1113 2111 title La Peste La Chute Caligula La France
Author a
AuthorBook b
Index scan left table to find matches. For each match in 'a', find matches in 'b'
In this an index scan on the right table on b.id for each matching record in 'a' This could be very expensive if there are many records matching a.country='France'
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 23
Query Optimization
JOINs can be expensive if:
We have many matches in the left table .
E.g, we have 1000 matches in left table
Each match in left table matches in right table with Index Scans
Primary Key joins are cheaper but perhaps you can denormalize table with the same PK (if you can you should). E.g, for each match in 'left' table we have 1000 matches in the right table we now have to read up 1000x1000 records, in 1000 network roundtrips!
Author a
Copyright 2009 MySQL Sun Microsystems.
AuthorBook b
The Worlds Most Popular Open Source Database 24
Query Optimization
Which of my book shops stock books from France?
SELECT c.id FROM a,b,c WHERE c.isbn=b.isbn AND b.id=a.id AND a.country='France' AND c.count>0 ;
Authid (PK) 1 2 3 4 Frame Albert Sully Johann Junichiro Iname Camus Prudhomme Goethe Tanizaki Country France France Germany Japan Authid (PK) ISBN (pk) 1 1 1 2 1111 1112 1113 2111 title La Peste La Chute Caligula La France
id(pk) 1 1 1 2 2 .. 10000
Author a
AuthorBook b
Bookshop c
Continue from previous slide for each ISBN in AuthorBook we have to JOIN on ISBN in Bookshop.
Each record found in AuthorBook renders an index scan on Bookshop (so in this example 4 more network hops) Index scans will be executed sequentially... This query can perform really slow as network hops cost.
Query Optimization
In real life...
A user does not have to view all email threads but perhaps only be presented with 10 at a time. A subscriber does not have 1M telecom services associated An author does not write 1M books
It is easier to optimize if we know the boundaries. Careful analysis of this problem reveals:
Authors in our database have written an ending amount of books. It is not N books! In this case an author writes max 64 books
Query Optimization
Denormalization could be an option:
ISBN (pk) 1111 1112 Authid (PK) 1 2 3 4 Frame Albert Sully Johann Junichiro Iname Camus Prudhomme Goethe Tanizaki Country France France Germany Japan ISBN_LIST <64 isbns> <64 isbns> 1113 2111 title La Peste La Chute Caligula La France
id(pk) 1 1 1 2 2 .. 10000
isbn_title b
Author2 a
Bookshop c
SELECT ISBN_LIST FROM a WHERE a.country='France' SELECT id FROM c WHERE c.isbn in (<ISBN_LIST>); Why better? Multi read range scan: one index scan takes all values in <isbn_list> instead of one index scan for each ISBN (many roundtrips)! We are down at two roundtrips for this query! Let's implement this as a SPROC and compare!
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 27
Query Optimization
SPROC with dynamic SQL:
CREATE PROCEDURE ab(IN c VARCHAR(255)) BEGIN SELECT @list:=isbn FROM author2 WHERE country=c; SET @s = CONCAT("SELECT DISTINCT id FROM bookshop WHERE count>0 AND isbn IN (", @list, ");"); PREPARE stmt FROM @s; EXECUTE stmt; END
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 28
Query Optimization
Normal query
mysqlslap --create-schema=test -q "SELECT DISTINCT id FROM author a, authorbook b, bookshop c WHERE b.authid=a.authid AND a.country='France' AND b.isbn=c.isbn AND c.count>0" -i 1000 -c 8 Average number of seconds to run all queries: 0.047 seconds
SPROC
mysqlslap --create-schema=test -q "call ab('France')" -i 1000 -c 8 Average number of seconds to run all queries: 0.011 seconds
Products
DXH510 PCI Express Host Adapter (600USD list price Oct 2008) DXS410 DX 10 Port Switch (4200USD list price Oct 2008) Read more at http://www.dolphinics.com
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 30
In general
JOINs are limiting scalability even for INNODB/MYISAM
It is a complex access pattern
JOINs should be as simple as possible when used in the Realtime part of the application:
WHERE conditions should be as limiting as possible 2-way/3-way inspecting ~2000 records in each table is usually no problem if well indexed and well written query. N-way JOINs with PK/Index scan matching a couple of records in each table is also ok if well indexed (N<10) N-way (N>1) JOINs matching > 5000 records will likely not be able to be executed within 5ms limit.
Using SCI/DX can help a lot as JOINs are subject to network latency that is problematic.
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 32
Query Tuning
Don't trust the OPTIMIZER!
Statistics gathering is very bad Optimizer thinks there are only 10 rows to examine in each table!
Query Tuning
mysql> explain select * from t1 where a=2 and ts='2009-10-05 14:21:34'; +----+-------------+-------+------+----------------------+----------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | 1 | SIMPLE | t1 | key | key_len | ref | rows | Extra | +----+-------------+-------+------+----------------------+----------+---------+-------+------+-------------+ | ref | idx_t1_a,idx_t1_a_ts | idx_t1_a | 9 | const | 10 | Using where | +----+-------------+-------+------+----------------------+----------+---------+-------+------+-------------+ 1 row in set (0.00 sec)
+----+-------------+-------+------+---------------+-------------+---------+-------------+------+-----------------------+ | ref | idx_t1_a_ts | idx_t1_a_ts | 13 | const,const | 10 | Using where | +----+-------------+-------+------+---------------+-------------+---------+-------------+------+------------------------+ 1 row in set (0.00 sec)
Ndb_cluster_connection_pool
Problem:
A mutex on the connection from the mysqld to the data nodes prevents scalability. Many threads contention on the mutex Must have many mysqld processes running...
Solution:
Ndb_cluster_connection_pool (in my.cnf) creates more connections from one mysqld to the data nodes
One free [mysqld] slot is required in config.ini for each connection.
Threads load balance on the connections less contention on mutex increased scalabilty Less mysqlds needed to drive load! www.severalnines.com/config allows you to specify the connection pool.
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 35
Ndb_cluster_connection_pool
Ndb_cluster_connection_pool
Transaction per second 40000 30000 20000 10000 0 1 2 4 8 Number of threads 16 32 Ndb_cluster_connection_ pool=1 (default) Ndb_cluster_connection_ pool=32
Gives atleast 70% better performance (>150% has been seen). How to set Ndb_cluster_connection_pool depends on how many CPU cores you have for the mysqld.
Ndb_cluster_connection_pool=2x<CPU cores> is a good starting point.
Copyright 2009 MySQL Sun Microsystems. The Worlds Most Popular Open Source Database 36
Other things
Statistics gathering for MySQL Cluster is bad
ndb_use_exact_count=0 ndb_index_stat_enable=1
Enable these actually cost more statistics are fetched before each query is executed. Costly.
You can enable this on a single test mysqld server dedicated to testing.
Other things
Query cache is most often useless with MySQL Cluster
Invalidating the cache is expensive Invalidation must happen on all mysql servers Disable it
Ndb_auto_increment_prefetch_size
Specifies how big range the mysqld should cache before asking Cluster for the next range. Default is Ndb_auto_increment_prefetch_size=1
When inserting it means one roundtrip down to cluster before each INSERT!
Other things
Make sure you never:
Run in SWAP the data nodes will be sub-performing and you will have an unstable system.
Tools
Third party tools at www.severalnines.com
Configurator
Uses best practices for setting up a good config.ini and my.cnf Scripts to control cluster from one single location.
Why MySQL?
http://mysql.com/customers/view/?id=566
Performance Reliability Lower costs
MySQL Cluster won the performance test hands-down, and it fitted our needs perfectly. We evaluated shared-disk clustered databases, but the cost would have been at least 10x more.
Franois Leygues, Systems Manager, Alcatel-Lucent
Copyright 2009 MySQL Sun Microsystems.
41
Why MySQL?
Low cost scalability High read and write throughput Extreme availability
Since deploying MySQL Cluster as our eCommerce database, we have had continuous uptime with linear scalability enabling us to exceed our most stringent SLAs
Sean Collier, CIO & COO, Shopatron Inc
Copyright 2009 MySQL Sun Microsystems.
42
Questions ?
Johan Andersson
PS Practice Manager, MySQL Cluster johan.andersson@sun.com
Mat Keep
MySQL Cluster Product Management Copyright 2009 MySQL Sun Microsystems. matthew.keep@sun.com
The Worlds Most Popular Open Source Database 44
The most cost-effective open source relational database for real-time, carrier-grade data management requirements