Skip to content

Tag: Pig

Pig HBase lookups

Pig can nicely read from and write data to HBase, which can be done as I described here. Additionally we may use Pig UDF to manage data in HBase – like retrieving some values for a given key. There is one difficulty though – Zookeeper manages the number of concurrent connections done to HBase and if our application exceeds that, then the whole job will simply fail.

Read more

Pig Java UDF

We can define Pig UDF in few languages: Java, Jython, JavaScript, Ruby, Groovy and Python. But currently the biggest choice of options we have in Java, so I’ll stick to it in this post.

Read more

Pig with HCatalog + Oozie

HCatalog enables Pig to read and write directly to Hive metastore. Pig dynamically determines structure of the table allowing easier data manipulation. Here’s how to make Pig work with HCatalog and how to run such jobs through Oozie.

Read more

HBase + Pig + Oozie

Although HBase is mostly used for lookups, sometimes there comes a need to perform bulk reads and writes. Doing that through Pig is very convenient. Here’s how to establish Pig-HBase communication.

Read more