Apache Beam and HCatalog
HCatalog gives a flexibility to read and write data to Hive metastore tables without specifying the tables schemas. Apache Beam provides a transform which allows querying the Hive data. It’s called HCatalogIO.
HCatalog gives a flexibility to read and write data to Hive metastore tables without specifying the tables schemas. Apache Beam provides a transform which allows querying the Hive data. It’s called HCatalogIO.
Sqoop may use HCatalog to import and export data directly into/from Hive tables. It uses HCatalog to read table’s structure, data formats, partitions and then imports/exports data appropriately. It’s very useful combination for efficient data move, but requires matching column names on both sides. Here’s how to make Sqoop with HCatalog work through Oozie.
HCatalog enables Pig to read and write directly to Hive metastore. Pig dynamically determines structure of the table allowing easier data manipulation. Here’s how to make Pig work with HCatalog and how to run such jobs through Oozie.