Monday, June 20, 2016

Using Hive with Apache Drill

This article will guide you on how to connect Apache Drill with Hive.

PreRequisite:

1. Hadoop should be installed.
2. Hive should be installed
3. Apache Drill should be installed.

Implementation:
1. Goto the path where hive is installed and start the hive metastore service:

Goto Hive Home
2. Start the Hive Metastore service.
Start Hive Service

3. You will observe the below screen once the hive metastore service is successfully implemented.

Successfully Start Hive Service

4. Start a new terminal.
5. Goto the path where Apache Drill is installed.

Goto Drill Home

6. Start the Drill server in embeded mode

Start Drill Embedded Mode

7. Once the drill server is started in embedded mode, you will observe the drill prompt:

Apache Drill Shell

7. Go to the Apache Drill browser at http:localhost:8047 . You will observe below screen:

Apache Drill Web UI

8. Click on Storage and then Enable hive storage plugin. The plugin should appear now in enabled storage plugins as shown below.


Enable Hive Storage Plugin


9. Click on update against the hive storage plugin and update the value as shown in the screen

Configure Hive Storage Plugin

10. Go back to the terminal where you have apache drill server started i.e. step 7.

11. Change the schema to use hive.

Change Hive Schema

12. Now you can run show tables to list all the hive tables;

List Hive Tables

12. You can run any query on the hive tables. Remember, the query will not invoke a MapReduce process.

Execute Queries on Hive via Apache Drill

Conclusion:
Hope this helps you to understand how to configure Apache Drill with Hive metastore interface to query hive tables directly.

Sourabh Jain
Big Data & Analytics Architect

No comments:

Post a Comment