Pig HBase lookups Alice Tags: HBase, Pig 0 Pig can nicely read from and write data to HBase, which can be done as I described here. Additionally we may use Pig UDF to manage data in HBase – like retrieving some values for a given key. There is one difficulty though – Zookeeper manages the number of concurrent connections done to HBase and if our application exceeds that, then the whole job will simply fail. Read more
Pig Java UDF Alice Tags: Pig 1 We can define Pig UDF in few languages: Java, Jython, JavaScript, Ruby, Groovy and Python. But currently the biggest choice of options we have in Java, so I’ll stick to it in this post. Read more
Hadoop troubleshooting & tricks Alice Tags: HBase, Hive, Impala, Oozie, Parquet, Pig, Sqoop 1 Last years while working with Hadoop I spent a lot of time dealing with issues or finding tricks for some solutions. That involved a lot searching, reading and mostly try and error approach. That’s why I decided to share some of the solutions I found and tried. Read more
Pig with HCatalog + Oozie Alice Tags: HCatalog, Oozie, Pig 0 HCatalog enables Pig to read and write directly to Hive metastore. Pig dynamically determines structure of the table allowing easier data manipulation. Here’s how to make Pig work with HCatalog and how to run such jobs through Oozie. Read more
HBase + Pig + Oozie Alice Tags: HBase, Oozie, Pig 0 Although HBase is mostly used for lookups, sometimes there comes a need to perform bulk reads and writes. Doing that through Pig is very convenient. Here’s how to establish Pig-HBase communication. Read more