ISBN 9789350239148,Programming Hive: Data Warehouse and Query Language for Hadoop

Programming Hive: Data Warehouse and Query Language for Hadoop



Shroff Publishers & Distributors Pvt Ltd

Publication Year 2012

ISBN 9789350239148

ISBN-10 9350239140

Paper Back

Number of Pages 368 Pages
Language (English)

Programming languages

Need to move a relational database application to Hadoop?Thiscomprehensive guide introduces you to Apache Hive, Hadoopsdatawarehouse infrastructure. Youll quickly learn how to use HivesSQLdialect-HiveQL-to summarize, query, and analyze largedatasetsstored in Hadoops distributed filesystem.
This example-driven guide shows you how to set up andconfigureHive in your environment, provides a detailed overview ofHadoopand MapReduce, and demonstrates how Hive works within theHadoopecosystem. Youll also find real-world case studies thatdescribehow companies have used Hive to solve unique problemsinvolvingpetabytes of data.
Use Hive to create, alter, and drop databases, tables,views,functions, and indexes
Customize data formats and storage options, from filestoexternal databases
Load and extract data from tables-and use queries,grouping,filtering, joining, and other conventional querymethods
Gain best practices for creating user definedfunctions(UDFs)
Learn Hive patterns you should use and anti-patterns youshouldavoid
Integrate Hive with other data processing programs
Use storage handlers for NoSQL databases and otherdatastores
Learn the pros and cons of running Hive on AmazonsElasticMapReduce
Chapter 1 Introduction
Chapter 2 Getting Started
Chapter 3 Data Types and File Formats
Chapter 4 HiveQL: Data Definition
Chapter 5 HiveQL: Data Manipulation
Chapter 6 HiveQL: Queries
Chapter 7 HiveQL: Views
Chapter 8 HiveQL: Indexes
Chapter 9 Schema Design
Chapter 10 Tuning
Chapter 11 Other File Formats and Compression
Chapter 12 Developing
Chapter 13 Functions
Chapter 14 Streaming
Chapter 15 Customizing Hive File and Record Formats
Chapter 16 Hive Thrift Service
Chapter 17 Storage Handlers and NoSQL
Chapter 18 Security
Chapter 19 Locking
Chapter 20 Hive Integration with Oozie
Chapter 21 Hive and Amazon Web Services (AWS)
Chapter 22 HCatalog
Chapter 23 Case Studies
Appendix References

About the Authors: Edward Capriolo, Dean Wampler, JasonRutherglen

Edward Capriolo is currently System AdministratoratMedia6degrees where he helps design and maintain distributeddatastorage systems for the internet advertising industry.
Edward is a member of the Apache Software Foundation andacommitter for the Hadoop-Hive project. He has experience asadeveloper as well Linux and network administrator and enjoystherich world of open source software.
Dean Wampler is a Principal Consultant at Think BigAnalytics,where he specializes in "Big Data" problems and toolslike Hadoopand Machine Learning. Besides Big Data, he specializesin Scala,the JVM ecosystem, JavaScript, Ruby, functional andobject-orientedprogramming, and Agile methods. Dean is a frequentspeaker atindustry and academic conferences on these topics. He hasa Physics from the University of Washington.
Jason Rutherglen is a software architect at Think BigAnalyticsand specializes in Big Data, Hadoop, search, andsecurity