Skip to content

Tag: HBase

Apache Beam and HBase

HBase is a NoSql database, which allows you to store data in many different formats (like pictures, pdfs, textfiles and others) and gives the ability to do fast and efficient data lookups. HBase has two APIs to chose from – Java API and HBase Shell. We can also connect HBase with some different tools like Hive or Phoenix and use SQL. HBase also integrates with Apache Beam via HBaseIO transform.

Read more

HBase Java API with Oozie

One of the ways to access HBase is through Java API. It can be done in multiple ways, depending on the case and tools used. Here’s how to achieve that with Oozie Java action and Pig action with UDF doing lookups in HBase.

Read more

Pig HBase lookups

Pig can nicely read from and write data to HBase, which can be done as I described here. Additionally we may use Pig UDF to manage data in HBase – like retrieving some values for a given key. There is one difficulty though – Zookeeper manages the number of concurrent connections done to HBase and if our application exceeds that, then the whole job will simply fail.

Read more

HBase + Pig + Oozie

Although HBase is mostly used for lookups, sometimes there comes a need to perform bulk reads and writes. Doing that through Pig is very convenient. Here’s how to establish Pig-HBase communication.

Read more

HBase + Sqoop + Oozie

Sqoop can be used to import data from the relational database into HBase. Although exporting data from HBase is not natively supported you can still manage it by putting Hive and HCatalog between HBase and Sqoop. Here’s how to do both importing and exporting with Oozie in Kerberised environment.

Read more