HBase Java API with Oozie
One of the ways to access HBase is through Java API. It can be done in multiple ways, depending on the case and tools used. Here’s how to achieve that with Oozie Java action and Pig action with UDF doing lookups in HBase.
One of the ways to access HBase is through Java API. It can be done in multiple ways, depending on the case and tools used. Here’s how to achieve that with Oozie Java action and Pig action with UDF doing lookups in HBase.
You can manipulate HBase tables through Oozie with Java or Shell actions. In order to use Shell action, you of course need to prepare a Shell script.
Sqoop may use HCatalog to import and export data directly into/from Hive tables. It uses HCatalog to read table’s structure, data formats, partitions and then imports/exports data appropriately. It’s very useful combination for efficient data move, but requires matching column names on both sides. Here’s how to make Sqoop with HCatalog work through Oozie.
Last years while working with Hadoop I spent a lot of time dealing with issues or finding tricks for some solutions. That involved a lot searching, reading and mostly try and error approach. That’s why I decided to share some of the solutions I found and tried.
There are few ways to build Oozie Sqooping action.
HCatalog enables Pig to read and write directly to Hive metastore. Pig dynamically determines structure of the table allowing easier data manipulation. Here’s how to make Pig work with HCatalog and how to run such jobs through Oozie.
Although HBase is mostly used for lookups, sometimes there comes a need to perform bulk reads and writes. Doing that through Pig is very convenient. Here’s how to establish Pig-HBase communication.
Sqoop can be used to import data from the relational database into HBase. Although exporting data from HBase is not natively supported you can still manage it by putting Hive and HCatalog between HBase and Sqoop. Here’s how to do both importing and exporting with Oozie in Kerberised environment.
Hive gives a nice option to manipulate the data stored in HBase. Not only it provides the SQL capabilities but also can be easily incorporated into the workflow processing.