Using OpenStack Database (Trove): Replication and Clustering; Looking ahead
[Editor's Note: This post is part 3 in a 3 part series about using Trove. It follows part 1 and part 2.]
In previous blog posts we described the replication feature for Trove, and the implementation in the Client and the Task Manager in detail.
In this post we describe some of the rationale for this implementation and the roadmap for features that provide performance and availability guarantees that are so critical for a database.
It is well recognized that solutions to problems of database performance and availability are closely interconnected. All of these solutions rely on the ability to have multiple copies of data, and to have multiple computers (that may or may not be at the same location) working together to provide the user with access to data. These kinds of solutions broadly depend on techniques like “replication”, “clustering” and “parallel computing”. While these three terms are often used interchangeably, there are subtle differences that are worth highlighting.
Replication is a mechanism whereby data is available in more than one location. In the simplest case of replication, there is a master and a slave. The master and slave are first synchronized and after that, any change to the master is replicated to the slave. The changes may be replicated synchronously or asynchronously. Replication may either guarantee an ordering of change replication or not and in some cases change ordering may only be guaranteed at specific thresholds called barriers. Databases provide replication natively (semi-synchronous replication in MySQL) or one could use additional software (such as Galera or Tungsten), but one could also consider disk mirroring either in hardware or software as forms of replication.
Clustering is a class of techniques where many computers work collectively, and perform some function or functions. For example, if data is replicated between two locations then it is possible for a database to access each data set and answer queries submitted to it. In such a system, one could have a copy of data in one location and accessed by one database instance and data in a either the same or a different location accessed by a second database instance. Then these two database instances would be considered to be a database cluster. In this example, each database instance is able to completely answer queries against the data. Oracle Real Application Clusters (RAC) is an example of a system of this kind. An application that splits writes and reads between a database master and replication slaves is another example of this kind of system.
Parallel computing is a particular example of cluster computing where multiple computers work together to provide some function or functions whereby multiple computers each perform some sub-function. In the context of databases, a common example of parallel computing is a parallel database. Here multiple computers each process a subset of a query based on a subset of the data that they have access to. Sharded databases, or most NoSQL databases (MongoDB, Cassandra for example) are examples of this kind of system.
“Clustering” means something different to different data stores. Since Trove is designed to support this diversity, its implementation of these features must use some common, basic operations to control these data stores in a consistent way. These operations are “to add a replication slave to a data store”, “to add a computer to a cluster”, “to remove a computer from a cluster”, and so on. The actual implementation of those features could then be performed in a manner that was applicable to that particular data store.
Therefore, in replication (v1) we create the framework for this feature and provide an implementation of this framework for MySQL. This implementation includes the feature set related to replicas for MySQL. For this implementation, we use the snapshot and replicate paradigm that is the standard for MySQL systems.
In a subsequent phase the community will deliver clustering which will implement primitives like “create cluster”, “add instance to cluster”, “replace instance” and “remove instance”. This scheme therefore makes it possible for us to also add extensions to these API calls that will define specialized instances for specific data stores such as the “create arbiter” call that will make sense only to MongoDB, which uses the mechanism of an arbiter.
The initial implementations of the replication and clustering feature are currently planned for release in the Juno release. Code for both of these features is under development and expected to be delivered in the juno-2 code drop. Blueprints for these features are currently available on Launchpad.
With the delivery of these features, the Trove project will have a solid framework on which one can implement replication and clustering for an arbitrary data store. We anticipate that with this framework, we will also be able to deliver the capability of a fully parallel MySQL database utilizing the clustering feature for parallelism and replication for redundancy. For this we plan to use the open source Database Virtualization Engine (DVE) capability that we have developed here at Tesora.
In parallel, we at Tesora, and other members of the Trove project community (from companies like Mirantis, HP, Rackspace and EBay) are contributing code that will support additional data stores and additional configuration capabilities in those data stores.
While Trove is already in production in a number of enterprises and public clouds, having these capabilities built in will make it dramatically easier to deliver a production-ready, enterprise-grade Database as a Service offering based on Trove in an OpenStack cloud.