By Edward Capriolo, Dean Wampler

Need to maneuver a relational database software to Hadoop? This complete advisor introduces you to Apache Hive, Hadoop’s information warehouse infrastructure. You’ll fast tips on how to use Hive’s SQL dialect—HiveQL—to summarize, question, and examine huge datasets saved in Hadoop’s dispensed filesystem.

This example-driven consultant indicates you the way to establish and configure Hive on your setting, offers a close evaluate of Hadoop and MapReduce, and demonstrates how Hive works in the Hadoop atmosphere. You’ll additionally locate real-world case stories that describe how businesses have used Hive to resolve designated difficulties concerning petabytes of data.

  • Use Hive to create, regulate, and drop databases, tables, perspectives, capabilities, and indexes
  • Customize facts codecs and garage suggestions, from documents to exterior databases
  • Load and extract information from tables—and use queries, grouping, filtering, becoming a member of, and different traditional question methods
  • Gain top practices for growing consumer outlined services (UDFs)
  • Learn Hive styles you can use and anti-patterns you want to avoid
  • Integrate Hive with different facts processing programs
  • Use garage handlers for NoSQL databases and different datastores
  • Learn the professionals and cons of working Hive on Amazon’s Elastic MapReduce

Show description

Read or Download Programming Hive PDF

Similar Computers books

Database Modeling and Design: Logical Design, 4th Edition (The Morgan Kaufmann Series in Data Management Systems)

Database platforms and database layout know-how have gone through major evolution in recent times. The relational facts version and relational database platforms dominate enterprise functions; in flip, they're prolonged via different applied sciences like information warehousing, OLAP, and information mining. How do you version and layout your database program in attention of latest know-how or new company wishes?

Computer Networking: A Top-Down Approach (6th Edition)

&>Computer Networking maintains with an early emphasis on application-layer paradigms and alertness programming interfaces (the best layer), encouraging a hands-on event with protocols and networking strategies, sooner than operating down the protocol stack to extra summary layers. This ebook has develop into the dominant e-book for this direction end result of the authors’ reputations, the precision of rationalization, the standard of the paintings application, and the worth in their personal supplementations.

The Guru's Guide to Transact-SQL

Seeing that its advent over a decade in the past, the Microsoft SQL Server question language, Transact-SQL, has turn into more and more well known and extra strong. the present model activities such complicated positive factors as OLE Automation aid, cross-platform querying amenities, and full-text seek administration. This booklet is the consummate advisor to Microsoft Transact-SQL.

Data Structures and Problem Solving Using Java (4th Edition)

Info buildings and challenge fixing utilizing Java takes a pragmatic and special approach to facts constructions that separates interface from implementation. it really is compatible for the second one or 3rd programming direction.   This publication presents a realistic advent to info constructions with an emphasis on summary pondering and challenge fixing, in addition to using Java.

Extra resources for Programming Hive

Show sample text content

On your Maven venture, create a pom. xml and comprise hive_test as a dependency, as proven right here: com. jointhegrid hive_test 3. zero. 1-SNAPSHOT Then create a model of hive-site. xml: $ cp $HIVE_HOME/conf/* src/test/resources/ $ vi src/test/resources/hive-site. xml not like a standard hive-site. xml, this model are usually not store any info to an enduring position. it's because unit assessments aren't presupposed to create or look after any everlasting country. javax. jdo. alternative. ConnectionURL is determined to exploit a characteristic in Derby that simply shops the database in major reminiscence. The warehouse listing hive. metastore. warehouse. dir is determined to a situation within /tmp that may be deleted on each one run of the unit attempt: javax. jdo. alternative. ConnectionURL jdbc:derby:memory:metastore_db;create=true JDBC attach string for a JDBC metastore hive. metastore. warehouse. dir /tmp/warehouse location of default database for the warehouse Hive_test presents numerous periods that reach JUnit try out instances. HiveTestService arrange the surroundings, cleared out the warehouse listing, and introduced a metastore and HiveService in-process. this is often mostly the component of expand for checking out. although, different elements, comparable to HiveTestEmbedded also are on hand: package deal com. jointhegrid. hive_test; import java. io. BufferedWriter; import java. io. IOException; import java. io. OutputStreamWriter; import org. apache. hadoop. fs. FSDataOutputStream; import org. apache. hadoop. fs. course; /* Extending HiveTestService creates and initializes the metastore and thrift provider in an embedded mode */ public category ServiceHiveTest extends HiveTestService { public ServiceHiveTest() throws IOException { super(); } public void testExecute() throws Exception { /* Use the Hadoop filesystem API to create a knowledge dossier */ course p = new Path(this. ROOT_DIR, "afile"); FSDataOutputStream o = this. getFileSystem(). create(p); BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(o)); bw. write("1\n"); bw. write("2\n"); bw. close(); /* ServiceHive is part that connections to an embedded or community HiveService in line with the used */ ServiceHive sh = new ServiceHive(); /* we will be able to now have interaction throughout the HiveService and assert on effects */ sh. shopper. execute("create desk atest (num int)"); sh. customer. execute("load information neighborhood inpath '" + p. toString() + "' into desk atest"); sh. purchaser. execute("select count(1) as cnt from atest"); String row = sh. buyer. fetchOne(); assertEquals("2", row); sh. patron. execute("drop desk atest"); } } the recent Plugin Developer package Hive v0. eight. zero brought a Plugin Developer package (PDK). Its motive is to permit builders to construct and attempt plug-ins with out the Hive resource. purely Hive binary code is needed. The PDK is comparatively new and has a few refined insects of its personal that may make it tricky to take advantage of.

Rated 4.07 of 5 – based on 25 votes