Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: complex transform

  1. #1

    Default complex transform

    I need to merge a number of large (>1M rows) unsorted CSV file inputs (5) into a single Table. They all have the same 7 keys. I know that sorted input is needed for MergeJoin but can SortRows handle sorting 1M+ rows?

    What is the best way to approach this?

    Gerry

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Quote Originally Posted by greno View Post
    I need to merge a number of large (>1M rows) unsorted CSV file inputs (5) into a single Table. They all have the same 7 keys. I know that sorted input is needed for MergeJoin but can SortRows handle sorting 1M+ rows?
    Yes, but slowly... personally I would stage the tables in the database and start from there... even without merge join

    Regards,
    Sven

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Actually, try the lazy conversion stuff. That lowers the CPU time needed to do serialization/de-serialization in the sort step.
    Performance shouldn't be too bad actually.
    If you run on multiple cores, try running 2 copies of the sort step with a Sorted Merge afterwards to join then back together.

    These things can really make a big difference,

    HTH,
    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.