Apache Sqoop Cookbook: Unlocking Hadoop for Your Relational by Kathleen Ting, Jarek Jarcec Cecho

Posted by

By Kathleen Ting, Jarek Jarcec Cecho

Integrating info from a number of assets is key within the age of huge information, however it could be a difficult and time-consuming job. this convenient cookbook offers dozens of ready-to-use recipes for utilizing Apache Sqoop, the command-line interface program that optimizes info transfers among relational databases and Hadoop. Sqoop is either strong and bewildering, yet with this cookbook's problem-solution-discussion layout, you will fast tips on how to set up after which observe Sqoop on your surroundings. The authors offer MySQL, Oracle, and PostgreSQL database examples on GitHub so you might simply adapt for SQL Server, Netezza, Teradata, or different relational structures.

Show description

Read or Download Apache Sqoop Cookbook: Unlocking Hadoop for Your Relational Database PDF

Best storage & retrieval books

The geometry of information retrieval

Keith Van Rijsbergen demonstrates how diverse types of knowledge retrieval (IR) may be mixed within the similar framework used to formulate the overall rules of quantum mechanics. the entire ordinary effects will be utilized to handle difficulties in IR, comparable to pseudo-relevance suggestions, relevance suggestions and ostensive retrieval.

Social Networks and the Semantic Web

No matter if we replaced the internet or the net has replaced us is hard to parent, despite the knowledge of hindsight. Social Networks and the Semantic net presents significant case reviews. the 1st case research exhibits the chances of monitoring a study neighborhood over the net, combining the knowledge bought from the internet with different info resources, and reading the consequences.

Combinatorial search

With the arrival of desktops, seek concept emerged within the sixties as a space of analysis in its personal correct. Sorting questions coming up in desktop technological know-how have been the 1st to be completely studied. yet quickly it used to be came across that the intrinsic complexity of many different info constructions might be fruitfully analyzed from a seek theoretic perspective.

Accidental Information Discovery. Cultivating Serendipity in the Digital Age

Unintentional details Discovery: Cultivating Serendipity within the electronic Age presents readers with an attractive dialogue at the methods serendipity―defined because the unintentional discovery of valued information―plays an incredible function in inventive problem-solving. This insightful source brings jointly discussions on serendipity and data discovery, learn in desktop and data technology, and fascinating options at the artistic procedure.

Additional resources for Apache Sqoop Cookbook: Unlocking Hadoop for Your Relational Database

Example text

The first column will be considered the lower bound, while the second column will be the upper bound. Both values are inclusive and will be imported. The type of both columns must be the same as the type of the column used in the --split-by parameter. Knowing your data and the purpose of your query allows you to easily identify the main table, if there is one, and select the boundaries from this table without any additional join or data transformations. The query used for fetching boundaries can indeed be arbitrary.

Solution You can use Sqoop’s export feature that allows you to transfer data from the Hadoop ecosystem to relational databases. info Discussion Export works similarly to import, except export transfers data in the other direction. Instead of transferring data from the relational database using SELECT queries, Sqoop will transfer the data to the relational database using INSERT statements. Sqoop’s export workflow matches the import case with slight differences. After you execute the Sqoop command, Sqoop will connect to your database to fetch various metadata about your table, including the list of all columns with their appropriate types.

1. Importing Data from Two Tables Problem You need to import one main table; however, this table is normalized. The important values are stored in the referenced dictionary tables, and the main table contains only numeric foreign keys pointing to the values in the dictionaries rather than to natural keys as in the original cities table. You would prefer to resolve the values prior to running Sqoop and import the real values rather than the numerical keys for the countries. Solution Instead of using table import, use free-form query import.

Download PDF sample

Rated 4.47 of 5 – based on 20 votes