Operational Database Administration

Editor’s Note, August 2020: CDP Data Center is now called CDP Private Cloud Base. You can learn more about it here.

Introduction

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP

This blog post gives you an overview of the operational database (OpDB) administration tools and features in the Cloudera Data Platform. It is available in two form factors today: as a fully secure, semi-managed offering in CDP Public Cloud – Data Hub and as a fully customizable offering in CDP Data Center (similar to what is available in CDH and HDP). For more information about Data Hub, see Cloudera Data Hub

Fig 1: OpDB Data Hub cluster.

You can use the links in this article to get more information and instructions to use these features. 

Database creation and control

Apache HBase namespaces are logical groups of tables that are similar to a database in a traditional relational database system. Namespaces can be created or managed through the Apache HBase Shell.  For more information about using the Apache HBase shell, see Apache HBase shell overview.

With Replication manager & Ranger in the picture with CDP, you can only create the namespace & manage it in HBase shell. But permissions are via Ranger and replication is via Replication Manager.

Just like in a relational database, namespaces contain collections of tables and permissions, replication settings, and resource isolation. You can set these configurations at the namespace level. In CDP, you can create a namespace and manage it using HBase shell. You can use Apache Ranger for fine-grained authorization policies and auditing. For more information about how to set up security in CDP, see Security using Ranger

Replication Manager helps you create HBase replication policies. You can use Replication Manager to set up replication between CDH/HDP or Apache HBase to CDP Data Center.

Fig 2: Creating replication policy user interface

Graphical DDL and DCL functionality

There are several tools provided for this including plugins for:

  • Cloudera Machine Learning (CML): CML helps you to query data using HBase client and Phoenix, and helps you in interactive data exploration, visualization, sharing, and collaboration. OpDB can be used to store Session/Job/Model prediction results for later querying by multiple different users.

Fig 3: Cloudera Machine Learning user interface

  • Hue: Hue is a web-based interactive query editor that enables you to interact with data warehouses. You can use the HBase Browser application in Hue to create and browse HBase tables. 

Fig 4: Hue interface supports search, insert, update, delete, DDL for HBase

You can use SQL interface using Impala or Hive for query processing in Hue.

Fig 5: SQL interface using Impala

Here’s a tutorial to create example tables in HBase using Hue: https://gethue.com/hadoop-tutorial-how-to-create-example-tables-in-hbase/  

Tools such as Zeppelin and Hue along with their plugins are provided out of the box. But, you can also use third-party SQL utilities such as Toad

Tools for the operational database release upgrade

You can use Cloudera Manager to automate the process of upgrading the operational database in your Cloudera Data Platform-Data Center (CDP-DC). Upgrades are provided through releases or maintenance patches. Cloudera Manager installs the releases and/or patches and manages the configuration as well as the restart process.

If you are using CDP on a public cloud such as Amazon AWS, you have to create a new Data hub cluster to upgrade to the new versions of various components.  For more information about creating a new operational database Data hub cluster, see Getting Started with Operational Database on CDP

Cloudera’s offering is a cluster-based offering; upgrades and patches all span multiple nodes (servers) and installation, configuration, reboot are all automated, including rolling reboots where applicable.

Patch management tools across multiple servers

In the CDP Data Center, Cloudera Manager installs the releases and manages the configuration. Cloudera Manager also does the restart process for each of the impacted components. 

Zero-downtime patch application

In the CDP Data Center, Cloudera Manager lets you apply patches with zero-downtime. 

Change-management across multiple servers

You can perform change management on database schemas across multiple instances. For example, you can do this on your test/dev, staging, or production environment.  

You can script the required changes using HBase shell, and then propagate it to the other instances. 

For more information about using HBase shell, see Apache HBase shell.

Workload partitioning

You can do Workload/application partitioning within OpDB using several tools depending on the nature of the set of workloads, and their data needs. 

If the applications all access separate tables, then region server groups can be used to dedicate a set of nodes for a defined set of tables or namespaces creating a hardware partitioning approach. For more information about region server groups, see Using RegionServer Grouping

For applications that use the same set of tables, you can use RPC throttling, user quotas, and space quotas to manage the noisy neighbor problem. See HBase quote management for more technical details.

You can also combine these two sets of options to have a more sophisticated partitioning scheme. Use Cloudera Manager to ensure that specific services are partitioned appropriately between different nodes of the cluster; for example, you can decide which nodes should be used for SOLR search, etc.  

Hardware partitioning

Cloudera Manager and YARN both leverage Linux cgroups and active memory management for both static and dynamic partitioning of hardware resources.  

First, all processes running on all hosts can be hard partitioned with cgroups, set by Cloudera Manager.  Second, a wizard lets users define the layout of static partitions for services by setting percentages, automatically translating cgroup-based CPU and I/O isolation, and sets memory limits by configuring services themselves. 

Finally, the native resource manager provides a container model for workloads that puts each discrete unit of work in a container, using cgroups and active memory management (set, monitor and kill) for application isolation.

Software hypervisors

The following software hypervisors are supported

  • VMware is supported for on-prem environments
  • Microsoft’s Azure’s virtual environments (Azure stack)
  • Amazon Web Services, Google Compute Platform’s virtualization, and Microsoft Azure are supported on the Cloud.

Container and orchestration support

Cloudera provides a Docker image that has Apache HBase, Apache ZooKeeper and Cloudera Manager installed on it. You can configure YARN to manage your Docker containers, and submit Apache HBase jobs to YARN on the same container or submit jobs to YARN from another container.  

For more information, see Manage Docker containers on YARN.

Rollback of patches or Release Upgrades

Cloudera Manager provides automation for some of the rollback processes. Upgrades may sometimes involve changes in data formats. Tooling to undo format changes are not supported, and you must trigger a restoration of data from backups so that rollback can use the old data.

Cross-OS-platform migration

Cloudera’s standard backup/restore/data recovery tools are available to support the migration of the OpDB between different operating systems. 

HBase backup and disaster recovery strategies ensure that your data is backed up to protect you from the loss of data. HBase snapshot enables you to take a snapshot of a table without much impact on RegionServers. Also because snapshot, clone, and restore operations do not involve data copying.

For more information about HBase backup and disaster, see HBase backup and disaster recovery strategies

Database administrator (DBA) tools

There are many tools included to support managing the database, including:

  • Cloudera Manager
  • HBase shell
  • Hue
  • HBCK2 
  • hbtop
  • Ranger 
  • Atlas
  • FreeIPA 
  • navencrypt 
  • HDFS tools
  • YARN

These tools provide metrics and monitoring, cluster restart, adding ingest, lifecycle-management, upgrades, security, Kerberos setup, and other features.  

Fig 6: Cloudera Manager HBase interface

Fig 7: Metrics and monitoring in Cloudera Manager:

Fig 8: Cluster restart in Cloudera Manager

In addition to these tools, you can also use the following third-party and open source administration tools:

Open documented interfaces for third-party management tools

We also provide open APIs to enable other tools to be used to manage OpDB. For example, the JMX interface can be used to integrate with third-party monitoring tools like Grafana.

Conclusion

In this blog post, we looked at how you can make use of the various administrative tools and capabilities provided by the OpDB in CDP. In the next article, we’ll cover how you can make use of the management capabilities in OpDB, check it out here.

Gokul Kamaraj
More by this author
Liliana Kadar
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.