Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Merging huge data together

  1. #1

    Default Merging huge data together


    I have this req.

    I have 6 to 7 files of containing same columns, And has the data around 7 to 8 Mil records in each file.
    The data is sorted with in the files on the key column (ID).

    I need to merge all these 6/7 files and then send it as one file for further processing.

    Which Step would be better to do it in a more efficient way.

    a) if I have to merge all these files. Will the data be sorted overall ? - (I need the data to be sorted on the key Column)
    b) Will the Sorted Merge Step be efficient in handling the data ?
    c) Should i use a Dummy step here to merge for speedy processing ?


  2. #2
    Join Date
    Nov 1999


    If you don't need to retain the sort, any step will merge the rows (union all).

    Sorted Merge is indeed very efficient in keeping the sort alive.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.