Thursday, September 1, 2011

MySQL Cluster is a brilliant NoSQL database


MySQL Cluster* is one of the most advanced and scalable databases available today and, despite what its name might suggest, it is also a brilliant NoSQL database**.

Let me discuss this statement!

First, let’s discuss the high level issues that NoSQL databases try to address:
-     Scalability. Traditional RDBMS technology was designed four decades ago, and is not appropriate for today’s Big Data requirements. Database systems today need to be able to scale horizontally over multiple machines to handle millions of users. As the CAP theorem states, it is not possible to achieve availability, scalability and consistency in one system. Several NoSQL databases sacrifice consistency for availability and scalability.
-     RDBMS has a rigid data model. Once a normalized data model has been defined in an RDBMS, it is difficult to change. Schema operations on a database are often not online, and what’s worse, they can be slow. This makes them hard or impossible to implement on production systems. NoSQL databases do not require fixed table schemas, they are ’schema-free’ and can more easily store unstructured data.
-     Simple APIs to access simple data structures, rather than SQL. Joins are not used, they introduce complexity and performance bottlenecks, especially in distributed scale-out systems.

Now, let us have a look at how MySQL Cluster stacks up against these requirements.
-       In terms of scalability, MySQL Cluster scales in a horizontal fashion across a number of nodes. Currently, up to 255 nodes are supported in one cluster. Data is automatically partitioned across a number of Data Nodes, and it is possible to configure a number of replicas for each data partition. Synchronous replication ensures that replicas are consistent and an XA transaction model also makes transactions consistent across partitions..
-       MySQL Cluster has a relational data model, and programmers do have to normalize their data into tables. But the great thing is that schema updates are online, such as adding/removing tables or indexes, or changing existing tables (e.g. adding a column to a table).
-       MySQL Cluster has an Access Layer that can consist of MySQL Servers (SQL), LDAP (NoSQL), native C++ API and Java API (both NoSQL). In the upcoming 7.2 release, a Memcached API (NoSQL) is now also available. So, there are plenty of NoSQL APIs to choose from in order to access NDB data.

Looking at the above, it seems that MySQL Cluster has the problem of having a relational data model where programmers still have the problem of normalizing their data.  A schema-free database sounds great from a developer’s point of view, however, I wonder how it would be to operate a schema-free database. How would the users do ad-hoc queries and analysis?



                                            “Just throw it in the database!”



If a relational schema is something that a user needs or is at least willing to accept, then MySQL Cluster is a brilliant NoSQL database. It solves the same problems as the new NoSQL databases do, yet you also get the traditional benefits from the RDBMS world: atomic transactions even for complex transactions, consistency, integrity and durability, possibility to do complex analytics using multiple columns and indexes. See also this excellent comment by Henrik Ingo about how some of the world's largest telco companies are using MySQL Cluster.

For the majority of NoSQL database products out there, these fundamental data management concerns of an application are being moved out of the database layer to deliver very high performance and scalability.  This means that we are moving these fundamental concerns up the stack in the application layer. I wonder how many out there would feel comfortable with that. What do you think?

If you have any questions or comments, feel free to reply to this blog below or reach out to us on Facebook, LinkedIn, Twitter or directly via these contact details.


* Maybe NDB Cluster is a more appropriate name? MySQL server is an SQL connector.


** Although with a relational model at the back.