Building a HQL IDE for Apache Hive
Phani Raj
My new team ( https://www.hadooponazure.com/) works on making
Windows Server & Windows Azure the best environment for hosting Hadoop.
As one my new challenges, I’ve been tasked with building an editor / job manager to Apache Hive.
Seen below is my first draft at providing a terminal for running and saving Hive Jobs using HQL.
Please direct all comments/feedback to PhaniRaj AT Microsoft DOT COM
This is what the editor looks like :
Let’s start by first creating a connection to your local Hive server.
Once the connection is created, we connect to the Hive server by using the Microsoft ODBC Driver for Hive .
We’ll soon move the console over to use Apache’s new Templeton apis for metadata access and job submission.
You can learn more about Templeton here : http://people.apache.org/~thejas/templeton_doc_latest/
Support for basic Hive Metadata visualization
We visualize the Hive metadata as a hierarchical tree-view .
A query editor for HQL that supports syntax coloring, auto completion and other fun activities.
The IDE hosts an editor that supports auto completion & syntax coloring for HQL keywords & functions.
You can find HQL’s language specification here : https://cwiki.apache.org/confluence/display/Hive/LanguageManual
Clicking on a table name from the above tree view produces a sample query that selects the first 10 rows from the table.
You can edit this query or clear it and start over.
HQL Code Snippets for common tasks in Hive
We’ve seeded the editor with some code snippets for common tasks .
Below is an example of the “Create External Table” code snippet.
Auto completion support for hive Functions in the editor.
Auto completion support for hive keywords in the editor.
About 163 keywords in HQL are supported in the editor.
Metadata sensitive auto completion support for queries.
We inject the metadata we gleaned from the Hive server into the editor so that
you can use intellisense on column names in your queries.
Once you have the query written, hit “Run Query” to kick off query execution.
We kick off the Hive job and wait for its completion, intermittently polling for the results.
Once the query is finished successfully, you will see the query icon light up.
Clicking on the query icon should bring up the results in a tabular format.
If on the other hand, the query failed, you will see a “query failed” icon next to the query.
Clicking on this will bring up a window with links where you can further see details about why the query failed.
In conclusion, I hope this is something that you find useful.
As always your feedback and comments are welcome at my email mentioned above.
There’s many other features that we’re planning for this editor and will post regular updates to this work as we progress.