{"id":1571,"date":"2016-09-05T11:40:59","date_gmt":"2016-09-05T05:40:59","guid":{"rendered":"http:\/\/techsatwork.com\/blog\/?p=1571"},"modified":"2016-09-05T20:47:14","modified_gmt":"2016-09-05T14:47:14","slug":"drill-data-apache-drill","status":"publish","type":"post","link":"https:\/\/techsatwork.com\/?p=1571","title":{"rendered":"Drill your data with Apache Drill !"},"content":{"rendered":"<h2>Feeling the Drill<\/h2>\n<p>I have been using <a href=\"https:\/\/drill.apache.org\">Apache Drill<\/a> to explore data for a while now.\u00c2\u00a0Apache Drill is a low latency distributed query engine for large-scale datasets, including structured and semi-structured\/nested data. Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. \u00c2\u00a0To be clear\u00c2\u00a0Drill is not limited to Hadoop, you can query NoSQL databases like MongoDB, Hbase or cloud storage like Amazon S3,\u00c2\u00a0Azure Blob Storage \u00c2\u00a0or even local files on your computer. \u00c2\u00a0I have it installed on my laptop and use it as embedded mode to query my txt and cvs files. Apache Drill can be installed on Windows, Linux and MacOS with\u00c2\u00a0JDK.<\/p>\n<h2>Drill data like a table even when its not &#8211; schema on read<\/h2>\n<p>Drill is based on schema on read, meaning unlike traditional query engines that requires to have a predefined schema and structure, drill lets you define schema as you query the data. Cool uh ? Wait there is more \u00c2\u00a0with Drill there&#8217;s no need to load the data or transform the data before it can be processed. Simply, point the query to the file or\u00c2\u00a0database you want to query and start querying the data.<br \/>\nFor instance lets say you have a file \u00c2\u00a0customers.csv \u00c2\u00a0on a directory \u00c2\u00a0\/data\/customer\/. Once you have Drill installed (which takes about 3 mins) all you have to from a Drill prompt is :<br \/>\n<span style=\"color: #808080;\"><em>select * from dfs.<code>\/data\/customer\/customers.csv<\/code>`; \u00c2\u00a0<\/em><\/span>and drill get you the data. You can even bring specific columns :<br \/>\n<span style=\"color: #808080;\"><em>select column[0],column[1],column[6] from dfs.<code>\/data\/customer\/customers.csv<\/code>`<\/em><\/span><\/p>\n<p>Drill also allows you to query against\u00c2\u00a0wild card files :<br \/>\n<span style=\"color: #808080;\"><em>select *\u00c2\u00a0from dfs.<code>\/data\/orders\/orders-08-*-2016.csv<\/code>`<br \/>\n<\/em><\/span>Drill lets you create views and static tables to even increase ease of use and improve performance. \u00c2\u00a0You can check out the <a href=\"https:\/\/drill.apache.org\/docs\/\" target=\"_blank\">documentation<\/a> for more options.<\/p>\n<h2>In love with your query or BI tool ? No problemo<\/h2>\n<p>Apache Drill supports standard SQL. So you can continue to use your favorite query tools and SQL that you have been using. Drill supports ODBC and JDBC drivers, so you it will let you access Drill using tool of your choice. \u00c2\u00a0Data users\u00c2\u00a0can use standard BI\/analytics tools such as Tableau, Qlik, MicroStrategy and so on to interact with non-relational datastores by leveraging Drill&#8217;s <a href=\"https:\/\/drill.apache.org\/docs\/interfaces-introduction\/\" target=\"_blank\">JDBC and ODBC<\/a> drivers. Developers can leverage Drill&#8217;s simple REST API in their custom applications to create beautiful visualizations. \u00c2\u00a0Drill comes with a web interface when you <a href=\"https:\/\/drill.apache.org\/docs\/installing-drill-in-distributed-mode\/\" target=\"_blank\">install in distributed mode<\/a>. Drill also provides a native tool called Drill Explorer which I find really useful. You can find all the details on how to configure your tool to access Drill in the\u00c2\u00a0<a href=\"https:\/\/drill.apache.org\/docs\/\" target=\"_blank\">documentation<\/a>.<\/p>\n<h2>Lets get it going &#8230;<\/h2>\n<p>Apache Drill is\u00c2\u00a0easy to download and run Drill on your computer\u00c2\u00a0. It runs on all standard OS and takes few minutes to install. Drill can also be installed on a cluster of servers to serve\u00c2\u00a0a\u00c2\u00a0scalable and high performance execution engine. \u00c2\u00a0Drill has two install options:<br \/>\n1. \u00c2\u00a0<a href=\"https:\/\/drill.apache.org\/docs\/installing-drill-in-embedded-mode\/\" target=\"_blank\">Installing in Embedded mode<\/a><br \/>\n2.<a href=\"https:\/\/drill.apache.org\/docs\/installing-drill-in-distributed-mode\/\" target=\"_blank\"> Installing in Distributed mode<\/a>.<\/p>\n<p>Installing in your computer that has JDK installed involves:<br \/>\n1.\u00c2\u00a0<a href=\"http:\/\/www.apache.org\/dyn\/closer.cgi\/drill\/drill-1.8.0\/apache-drill-1.8.0.tar.gz\" target=\"_blank\">Downloading the tar file<\/a><br \/>\n2. Untar the file<br \/>\n3. cd to the apache-drill&lt;version&gt;<br \/>\n4. run \u00c2\u00a0<span style=\"color: #808080;\"><em>bin\/drill-embedded<\/em><\/span> (Mac and Linux) . On windows :\u00c2\u00a0<span style=\"color: #808080;\"><em>C:\\bin\\sqlline sqlline.bat \u00e2\u20ac\u201cu &#8220;jdbc:drill:zk=local;schema=dfs&#8221;<\/em><\/span><\/p>\n<p>&nbsp;<\/p>\n<p>Drill in to your data with Apache Drill and hopefully you will enjoy drilling as much as I do.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Feeling the Drill I have been using Apache Drill to explore data for a while now.\u00c2\u00a0Apache Drill is a low latency distributed query engine for large-scale datasets, including structured and semi-structured\/nested data. Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-container-style":"default","site-container-layout":"default","site-sidebar-layout":"default","site-transparent-header":"default","disable-article-header":"default","disable-site-header":"default","disable-site-footer":"default","disable-content-area-spacing":"default","footnotes":""},"categories":[917,303,925],"tags":[936,937,938],"class_list":["post-1571","post","type-post","status-publish","format-standard","hentry","category-bigdata","category-database","category-hadoop","tag-apache-drill","tag-drill","tag-schema-on-read"],"_links":{"self":[{"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/posts\/1571","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techsatwork.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1571"}],"version-history":[{"count":7,"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/posts\/1571\/revisions"}],"predecessor-version":[{"id":1580,"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/posts\/1571\/revisions\/1580"}],"wp:attachment":[{"href":"https:\/\/techsatwork.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1571"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techsatwork.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1571"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techsatwork.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1571"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}