In the latest refresh of its managed public cloud service, Cloudera has announced preview of a new operational database service that updates, for the cloud, its implementation of HBase. The service, CDP Operational Database, for now will only be available on CDP Public Cloud on AWS and Azure. It’s one of a pair of releases announced today, the other being Cloud Data Engineering, which is covered by Big on Data bro Andrew Brust.
Cloudera Operational Database extends HBase with some usability and accessibility enhancements. For starters, it overlaps a new control plane that simplifies setup and deployment; the customer fills out some basic settings on a data entry form, and an instance will be set up within a Virtual Private Cloud (VPC) within minutes. And unlike original HBase, where queries had to be written in Java, Cloudera Operational Database service provides several options. And of course, this being cloud-native, storage is S3- and ADLS-compatible, not HDFS.
You can use the old fashioned HBase Java or REST APIs, or you can access the database through Apache Phoenix, which is an API that translates your SQL query into a series of HBase scans to generate result sets accessible through JDBC. It was designed to enable transaction processing and operational analytics on HBase/Hadoop. For higher performance, it can be paired with the HBase API to scan millions of rows within seconds, and it can be integrated with Spark and Hive.
In practice, HBase has traditionally been compared to Apache Cassandra, both of which are Apache projects (and at one time, Cassandra was even listed as one of Hadoop’s subprojects). For starters, both have similar NoSQL wide-column data structures. But that’s where the similarities end.
We’ve considered HBase to be the more rudimentary analog of Cassandra, as the latter is a fully self-contained database, whereas the former is an extension of Hadoop, relying on HDFS for data storage (as noted, in the cloud that has been updated) and Zookeeper for tracking server status. Also, Cassandra has a query language while HBase did not. Then there is HBase’s single master architecture vs. Cassandra’s multi-master setup. While Cassandra’s multi-master architecture carries advantages for data ingestion and write operations, HBase’s single master is better suited for faster, more strongly consistent reads, especially when retrieving small bits of data.
With this release, Cloudera Operational Database is Cloudera’s answer to Amazon DynamoDB and the new managed cloud services for Cassandra: DataStax Astra and Amazon Keyspaces. Given the large skills bases for DynamoDB and Cassandra, we don’t expect that Cloudera Operational Database will compete for standalone NoSQL operational workloads. Unlike the AWS offerings, Cloudera Operational Database will not be serverless, but it will have autoscaling capabilities.
Instead, Cloudera’s operational database’s strength is being part of the CDP umbrella, which also includes SDX for data governance and security. More important, however, is potential integration with other CDP services to support a full data lifecycle. We could imagine customers defining models in Cloudera Machine Learning, then applying them in real-time for closed-loop processes in any number of use cases in financial services, retail, or manufacturing. We could also envision a tie-in with Cloudera DataFlow for operational IoT applications that, incidentally, could also apply machine learning for predictive or prescriptive analytics, especially with asset management and maintenance.
The preview of Cloudera Operational Database will be initially available on CDP Public Cloud running on AWS and Azure.