Alice

HBase in Oozie shell action

Alice
Tags: HBase, Oozie, Shell
0

You can manipulate HBase tables through Oozie with Java or Shell actions. In order to use Shell action, you of course need to prepare a Shell script.

Alice
Tags: Spark
0

Spark 2.0 brought some changes to the API – the link between the Dataset and DataFrame was created. Now the DataFrame = Dataset[Row] (in Scala and Java), where Row is an untyped generic object representing a table-like record, with a schema. But it doesn’t mean that the DataFrame itself was dropped. It’s still the main abstraction for MLlib or SparkR and Python language. Dataset is currently only available in Scala and Java as it’s a strongly typed abstraction. It’s super easy to switch between the untyped DataFrame and typed Dataset.

Alice
Tags: HBase
0

One of the ways to access HBase is through Java API. Here is the code for simple HBase manipulations like creating and changing a table, adding, retrieving and deleting data.

Sqoop with HCatalog and Oozie

Sqoop may use HCatalog to import and export data directly into/from Hive tables. It uses HCatalog to read table’s structure, data formats, partitions and then imports/exports data appropriately. It’s very useful combination for efficient data move, but requires matching column names on both sides. Here’s how to make Sqoop with HCatalog work through Oozie.

Alice
Tags: Pig
1

We can define Pig UDF in few languages: Java, Jython, JavaScript, Ruby, Groovy and Python. But currently the biggest choice of options we have in Java, so I’ll stick to it in this post.

Alice
Tags: HBase, Hive, Impala, Oozie, Parquet, Pig, Sqoop
1

Last years while working with Hadoop I spent a lot of time dealing with issues or finding tricks for some solutions. That involved a lot searching, reading and mostly try and error approach. That’s why I decided to share some of the solutions I found and tried.

Alice
Tags: Oozie, Sqoop
0

There are few ways to build Oozie Sqooping action.

Alice
Tags: HCatalog, Oozie, Pig
0

HCatalog enables Pig to read and write directly to Hive metastore. Pig dynamically determines structure of the table allowing easier data manipulation. Here’s how to make Pig work with HCatalog and how to run such jobs through Oozie.

Alice
Tags: HBase, Oozie, Pig
0

Although HBase is mostly used for lookups, sometimes there comes a need to perform bulk reads and writes. Doing that through Pig is very convenient. Here’s how to establish Pig-HBase communication.

Alice
Tags: HBase, Oozie, Sqoop
0

Sqoop can be used to import data from the relational database into HBase. Although exporting data from HBase is not natively supported you can still manage it by putting Hive and HCatalog between HBase and Sqoop. Here’s how to do both importing and exporting with Oozie in Kerberised environment.

« Previous

HBase in Oozie shell action

Spark 2 APIs

Simple Java API HBase manipulation

Sqoop with HCatalog and Oozie

Pig Java UDF

Hadoop troubleshooting & tricks

Oozie Sqoop action

Pig with HCatalog + Oozie

HBase + Pig + Oozie

HBase + Sqoop + Oozie