Skip to content

Tag: Beam

Apache Beam JDBC

With Apache Beam we can connect to different databases – HBase, Cassandra, MongoDB using specific Beam APIs. We also have a JdbcIO for JDBC connections. Here I show how to connect with MSSQL database using Beam and do some data importing and exporting in Kerberised environment.

Read more

Apache Beam and HBase

HBase is a NoSql database, which allows you to store data in many different formats (like pictures, pdfs, textfiles and others) and gives the ability to do fast and efficient data lookups. HBase has two APIs to chose from – Java API and HBase Shell. We can also connect HBase with some different tools like Hive or Phoenix and use SQL. HBase also integrates with Apache Beam via HBaseIO transform.

Read more

Apache Beam – getting started

Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream processing. It is a processing tool which allows you to create data pipelines in Java or Python without specifying on which engine the code will run. So the same code can be run on MapReduce, Spark, Flink, Apex or some other engine.

Read more