Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Mapreduce on processing the rows from multiple mysql tables

  1. #1
    Join Date
    Aug 2013
    Posts
    8

    Default Mapreduce on processing the rows from multiple mysql tables

    Hi all,
    I would like to apply Mapredcue for a particular usecase. I need some help from the forum people.

    There are some mysql tables (assume 50 tables), now i want to read each record from all the tables and process them independently. Asume here the processing is some thing like serializing the row data and and storing in some file. This file must contain the serialized rows from all the 50 source tables.

    Now, how to proceed using pentaho mapreduce feature.?

    My thought:
    The way i thought would be, in one step we read from 50 tables into 50 text files.
    second step, copy all the 50 files to hdfs with hdfs output step.
    third step, have a serialization step which will be sourced with hdfs input step.

    It appears to be clumsy way, as it requires somany text files and is not scalable solution..
    Any better/decent way of going to solve this..?

  2. #2
    Join Date
    Aug 2013
    Posts
    8

    Default

    Do anybody have any idea of how to go about solving this..?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.