Big Data Technical Articles: June 2016

Sunday, June 26, 2016

Using Apache Drill REST API to query Hive Data

This article will guide you how to use Apache Drill REST API interface to query HIVE data. We will say how hive can be queries but the same approach can be used to query data from Hbase, MongoDB , flat file etc

PreRequisite:

1. Apache Hadoop should be installed.
2. Apache Hive should be installed.
3. Apache Drill is installed.

Implementation:
1. Goto the path where hive is installed.

2. Start the hive Metastore service.

3. Start a new terminal and goto path where Apache Drill is installed

4. Start the drill service in embeded mode.

5. Open Web-browser and goto localhost:8047

6. Goto Storage tab and enable hive storage plugin

7. Click on update against the hive storage plugin and change the values as shown below:

8. Lets fire a query on the hive tables now. We have a transaction_orc table in hive and a store table. We want to find the total sales across each store.

9. Now we will look to fire the same query via a REST API.

10. Open a Web browser with any REST API client. We will user Firefox with the REST Client plugin.

11. Select the method as "POST" , URL as "http://localhost:8047/query.json" , Header as "Content-Type":"application/json"

12. In the body section paste the json as shown below:

13. Now click on send button and validate the response.

14. As seen we have got the result in the json format. We can use this approach using any REST API programming framework,parse the data and display the data on the UI.

Conclusion:

Hope this helps you to understand how to configure Apache Drill and use REST API to query data
Sourabh Jain
Big Data & Analytics Architect

Tuesday, June 21, 2016

Using Beeline to connect to HiveServer 2

This article will guide you on how to connect Apache Drill with Hive.

PreRequisite:

1. Hadoop should be installed.
2. Hive should be installed

Implementation:
1. Goto the path where hive is installed.

2. Start the Hive Server2 process.

3. This will start the hiveserver2 process as a foreground process. In order to stop the process , press Ctrl + C. If the process is to be started as background process , execute "./hive --service hiveservice2 &"

4. Start a new terminal.

5. Goto the hive installation path as mentioned in step 1.

6, If there is a metastore_db folder , go inside the same and remove all the *.lck files. This is required because default database derby supports only 1 connection. Recommended metastore db for production is mysql.

7. Start the beeline shell

8. The beeline shell will be started as shown above. Lets connect to the hiveserver2 process.

9. On executing "!connect jdbc:hive2://" command , user will be prompted for username and password. Just press enter to continue. As we are running both the beeline client and hiveserver2 process on the same node, we execute the command as "!connect jdbc:hive2://" . If the hiveserver2 process was running on the a different server , we would need to connect as "!connect jdbc:hive2://hostname:port". The default port for hiveserver2 process is 10000.

10. Lets display all the hive tables.

11. Lets execute some queries against the tables.

12. In order to quit the beeline shell, execute "!quit" command

Conclusion:
Hope this helps you to understand how to connect hiveserver2 process using beeline.

Sourabh Jain
Big Data & Analytics Architect

Monday, June 20, 2016

Using Hive with Apache Drill

This article will guide you on how to connect Apache Drill with Hive.

PreRequisite:

1. Hadoop should be installed.
2. Hive should be installed
3. Apache Drill should be installed.

Implementation:
1. Goto the path where hive is installed and start the hive metastore service:

Goto Hive Home

2. Start the Hive Metastore service.

Start Hive Service

3. You will observe the below screen once the hive metastore service is successfully implemented.

Successfully Start Hive Service

4. Start a new terminal.
5. Goto the path where Apache Drill is installed.

Goto Drill Home

6. Start the Drill server in embeded mode

Start Drill Embedded Mode

7. Once the drill server is started in embedded mode, you will observe the drill prompt:

Apache Drill Shell

7. Go to the Apache Drill browser at http:localhost:8047 . You will observe below screen:

Apache Drill Web UI

8. Click on Storage and then Enable hive storage plugin. The plugin should appear now in enabled storage plugins as shown below.

Enable Hive Storage Plugin

9. Click on update against the hive storage plugin and update the value as shown in the screen

Configure Hive Storage Plugin

10. Go back to the terminal where you have apache drill server started i.e. step 7.

11. Change the schema to use hive.

Change Hive Schema

12. Now you can run show tables to list all the hive tables;

List Hive Tables

12. You can run any query on the hive tables. Remember, the query will not invoke a MapReduce process.

Execute Queries on Hive via Apache Drill

Conclusion:
Hope this helps you to understand how to configure Apache Drill with Hive metastore interface to query hive tables directly.

Sourabh Jain
Big Data & Analytics Architect