Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: How to extract data from mysql to hive?

  1. #1

    Question How to extract data from mysql to hive?

    Hi

    I have a request to extract the data from mysql to hive.

    For example: the following info in mysql table myUser is created on 20130701.(user_global_id is the primary key)


    mysql> select * from myUser;
    +----------------+------+-------+-------+
    | user_global_id | name | last | first |
    +----------------+------+-------+-------+
    | 55 | a | a | a |
    | 53 | c | c | c |
    | 51 | b | b | b |
    +----------------+------+-------+-------+

    On 20130710, the customer changed the info in myUser table to:
    (delete the 53 record, change the 51 record, and insert the 56 record).
    mysql> select * from myUser;
    +----------------+------+-------+-------+
    | user_global_id | name | last | first |
    +----------------+------+-------+-------+
    | 55 | a | a | a |
    | 51 | b0 | | b0|
    56 | d | b | d |

    +----------------+------+-------+-------+

    What the customer required the info in Hive is:

    +----------------+------+-------+-------+
    | user_global_id | name | last | first |starttime |endtime
    +----------------+------+-------+-------+
    | 55 | a | a | a |20130701 |NULL
    | 53 | c | c | c |20130701 |20130710
    | 51 | b | b | b |20130701 |20130710
    | 51 | b0 | b0 | b0 | 20130710 |NULL
    56 | d | b | d |20130710 |NULL

    +----------------+------+-------+-------+

    Is there any example like it?
    It includes:
    1. How to extract the data from mysql to hive (Does Kettle Support for Hive via the Table Output Step now?)
    2. How to implement the insert/updating?
    A. Adding two columns starttime/endtime for each record to mark the life of this record.
    B. If the records is not changed, then the endtime is null. It means it is valid now.
    If the record is modified, then there are two records for this line. One is to modifed the endtime to current time. At the same time, insert a new one, the start time is now, the end time is NULL.
    If the record is deleted in the source data, then mark the endtime to now.

    Thanks.
    Maria

  2. #2

    Default

    Hi, benslin
    Thanks for your reply.

    1. Does sqoop support for Hive via the Table Output Step? If yes, which version does kettle support this function?

    2.Can you give me an exmaple to support the incremental loading from mysql to hive via sqoop?

    Best Regards.
    Maria
    Last edited by mariax; 07-15-2013 at 02:46 AM.

  3. #3

    Default

    Hi, benslin

    When I tried to use the sqoop in kettle to import the data, I met the following error, can you explian me which jar file should be replaced?
    Thanks for your helping.
    2013/07/17 09:57:51 - sqoop_import - 开始项[Sqoop Import]
    2013/07/17 09:57:51 - Sqoop Import - 2013/07/17 09:57:51 - fs.default.name is deprecated. Instead, use fs.defaultFS
    2013/07/17 09:57:51 - Sqoop Import - 2013/07/17 09:57:51 - Preparing to use a MySQL streaming resultset.
    2013/07/17 09:57:51 - Sqoop Import - 2013/07/17 09:57:51 - Beginning code generation
    2013/07/17 09:57:51 - Sqoop Import - 2013/07/17 09:57:51 - Executing SQL statement: SELECT t.* FROM `deltaUser` AS t LIMIT 1
    2013/07/17 09:57:51 - Sqoop Import - 2013/07/17 09:57:51 - Executing SQL statement: SELECT t.* FROM `deltaUser` AS t LIMIT 1
    2013/07/17 09:57:51 - Sqoop Import - 2013/07/17 09:57:51 - HADOOP_HOME is D:\software\data-integration\plugins\pentaho-big-data-plugin\hadoop-configurations\cdh4\lib\client
    2013/07/17 09:57:51 - Sqoop Import - 2013/07/17 09:57:51 - Writing jar file: \tmp\sqoop-mariax\compile\3013dabf92f0da134637927708100320\deltaUser.jar
    2013/07/17 09:57:51 - Sqoop Import - 2013/07/17 09:57:51 - Beginning mysqldump fast path import
    2013/07/17 09:57:51 - Sqoop Import - 2013/07/17 09:57:51 - Beginning import of deltaUser
    2013/07/17 09:57:51 - Sqoop Import - 2013/07/17 09:57:51 - SQOOP_HOME is unset. May not be able to find all job dependencies.
    2013/07/17 09:57:51 - Sqoop Import - 2013/07/17 09:57:51 - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2013/07/17 09:57:51 - Sqoop Import - 2013/07/17 09:57:51 - BoundingValsQuery: SELECT MIN(`user_global_id`), MAX(`user_global_id`) FROM deltaUser
    2013/07/17 09:57:52 - Sqoop Import - 2013/07/17 09:57:52 - Running job: job_201307100331_0049
    2013/07/17 09:57:53 - Sqoop Import - 2013/07/17 09:57:53 - map 0% reduce 0%
    2013/07/17 09:57:59 - Sqoop Import - 2013/07/17 09:57:59 - Task Id : attempt_201307100331_0049_m_000005_0, Status : FAILED
    2013/07/17 09:57:59 - Sqoop Import - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : 2013/07/17 09:57:59 - java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.sqoop.mapreduce.RawKeyTextOutputFormat not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1587)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:227)
    at org.apache.hadoop.mapred.Task.initialize(Task.java:515)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:314)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    Caused by: java.lang.ClassNotFoundException: Class org.apache.sqoop.mapreduce.RawKeyTextOutputFormat not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1585)
    ... 8 more

    Thanks.
    Maria
    Last edited by mariax; 07-16-2013 at 10:02 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.