Thursday, September 1, 2011

MySQL Cluster is a brilliant NoSQL database


MySQL Cluster* is one of the most advanced and scalable databases available today and, despite what its name might suggest, it is also a brilliant NoSQL database**.

Let me discuss this statement!

First, let’s discuss the high level issues that NoSQL databases try to address:
-     Scalability. Traditional RDBMS technology was designed four decades ago, and is not appropriate for today’s Big Data requirements. Database systems today need to be able to scale horizontally over multiple machines to handle millions of users. As the CAP theorem states, it is not possible to achieve availability, scalability and consistency in one system. Several NoSQL databases sacrifice consistency for availability and scalability.
-     RDBMS has a rigid data model. Once a normalized data model has been defined in an RDBMS, it is difficult to change. Schema operations on a database are often not online, and what’s worse, they can be slow. This makes them hard or impossible to implement on production systems. NoSQL databases do not require fixed table schemas, they are ’schema-free’ and can more easily store unstructured data.
-     Simple APIs to access simple data structures, rather than SQL. Joins are not used, they introduce complexity and performance bottlenecks, especially in distributed scale-out systems.

Now, let us have a look at how MySQL Cluster stacks up against these requirements.
-       In terms of scalability, MySQL Cluster scales in a horizontal fashion across a number of nodes. Currently, up to 255 nodes are supported in one cluster. Data is automatically partitioned across a number of Data Nodes, and it is possible to configure a number of replicas for each data partition. Synchronous replication ensures that replicas are consistent and an XA transaction model also makes transactions consistent across partitions..
-       MySQL Cluster has a relational data model, and programmers do have to normalize their data into tables. But the great thing is that schema updates are online, such as adding/removing tables or indexes, or changing existing tables (e.g. adding a column to a table).
-       MySQL Cluster has an Access Layer that can consist of MySQL Servers (SQL), LDAP (NoSQL), native C++ API and Java API (both NoSQL). In the upcoming 7.2 release, a Memcached API (NoSQL) is now also available. So, there are plenty of NoSQL APIs to choose from in order to access NDB data.

Looking at the above, it seems that MySQL Cluster has the problem of having a relational data model where programmers still have the problem of normalizing their data.  A schema-free database sounds great from a developer’s point of view, however, I wonder how it would be to operate a schema-free database. How would the users do ad-hoc queries and analysis?



                                            “Just throw it in the database!”



If a relational schema is something that a user needs or is at least willing to accept, then MySQL Cluster is a brilliant NoSQL database. It solves the same problems as the new NoSQL databases do, yet you also get the traditional benefits from the RDBMS world: atomic transactions even for complex transactions, consistency, integrity and durability, possibility to do complex analytics using multiple columns and indexes. See also this excellent comment by Henrik Ingo about how some of the world's largest telco companies are using MySQL Cluster.

For the majority of NoSQL database products out there, these fundamental data management concerns of an application are being moved out of the database layer to deliver very high performance and scalability.  This means that we are moving these fundamental concerns up the stack in the application layer. I wonder how many out there would feel comfortable with that. What do you think?

If you have any questions or comments, feel free to reply to this blog below or reach out to us on Facebook, LinkedIn, Twitter or directly via these contact details.


* Maybe NDB Cluster is a more appropriate name? MySQL server is an SQL connector.


** Although with a relational model at the back.

2 comments:

  1. Great post Vinay, and this supports many of the concepts we discuss in our new Scaling Web Databases guide:
    http://mysql.com/why-mysql/white-papers/mysql_wp_scaling_web_databases.php

    A big benefit of MySQL Cluster's multi-master design when coupled with auto-sharding tables across nodes is the ability to scale write operations across commodity servers - a significant factor as developers and architects look to better support web and cloud-based workloads

    We also see many users wanting integrated replication (within and across data centers) heartbeating, failover and recovery within the database itself, as alternatives to having deploy 3rd party HA mechanisms beneath the database layer - again something MySQL Cluster delivers out of the box - along with the ability to scale the database on-demand, without downtime

    The MySQL Cluster 7.2 Development Milestone promises even more - with native support for the memcached API as you mention, as well as the ability to dramatically enhance the performance of complex queries - enabling users to run things like real-time analytics on live data sets:
    http://dev.mysql.com/tech-resources/articles/mysql-cluster-labs-dev-milestone-release.html

    An exciting time for all involved with Cluster

    ReplyDelete
  2. Hi Vinay

    I would go even further. As I see it, you can use MySQL Cluster (or MySQL/InnoDB for that matter) in a schemaless fashion if you want:

    CREATE TABLE t (k INT PRIMARY KEY auto_increment, v BLOB);
    SELECT v FROM t WHERE k=1;

    Now it's schemaless :-)

    Seriously, de-normalizing like this can even be a performance booster in some cases.

    Also, the LDAP interface is perhaps not schemaless, but it does enforce it's own particular schema, so it might as well be. I bet the memcached interface will enforce something like I write above.

    So what I'm saying is, it's great that you can define a schema to protect data integrity when you feel like you need it, but an RDBMS can always be used in a schemaless fashion too, and sometimes is.

    ReplyDelete