Cloudera Operational Database (COD) Performance Benchmarking: Comparing HDFS and Cloud Storage

neub9
By neub9
3 Min Read

Discover the secret to deploying high-performance applications at scale with the Cloudera Operational Database (COD). COD is a rugged operational database designed to power the biggest data applications on the planet at any scale. It is powered by Apache HBase and Apache Phoenix and comes out of the box with Cloudera Data Platform (CDP) in the public cloud. In addition, it is multi-cloud ready to meet your business needs across AWS, Microsoft Azure, or GCP.

COD supports cloud storage, offering a choice of price performance characteristics to customers, in addition to pre-existing support for HDFS on local storage. To understand how COD delivers the best cost-efficient performance for your applications, let’s dive into benchmarking results comparing COD using cloud storage vs. COD on premises.

In a performance comparison, we measured the performance differences between COD using storage on Hadoop Distributed File System (HDFS) and COD using cloud storage such as AWS S3 and Azure ABFS. These performance measurements were conducted on COD 7.2.15 runtime version and included testing for read-write workloads and read only workloads.

The performance benchmark was carried out in a test environment with a cluster running HBase on cloud storage configured with a combined bucket cache size across the cluster as 32TB, with L2 bucket cache configured to use file-based cache storage on ephemeral storage volumes of 1.6TB capacity each.

The benchmark results showed that the average performance for a S3 based cluster with ephemeral cache was better by a factor of 1.7x compared to HBase running on HDFS on HDD. Read throughput for S3 based cluster was better by around 1.8x for both HBase and Phoenix compared to the HDFS based cluster.

Further analysis revealed that factors affecting the performance of S3 include cache warming, AWS S3 throttling, non-atomic operations, and slow bulk delete operations. It was observed that overall, the cache warming on S3 took around 130 minutes with an average throughput of 2.62 GB/s.

Comparative representation of various parameters, including throughput and latencies, showed that S3 had better performance compared to HDFS in various workloads. Similar results were observed with Azure ABFS, where HBase running on ABFS storage showed almost 2x improvement in throughput and more than 2x improvement in read latency compared to HBase running on HDFS.

By leveraging COD and cloud storage such as AWS S3 and Azure ABFS, businesses can deploy high-performance applications at scale more efficiently, meeting the demands of modern data applications.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *