Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Sort rows with Unique Rows

  1. #1
    Join Date
    Mar 2006
    Posts
    114

    Lightbulb Sort rows with Unique Rows

    There is a transformation that reads 100934027 lines and ordering them to remove the duplicate records.
    The process that normally occurs only at the time he finishes the planning and begins to move the lines to the step "Unique Rows", the transformation launches an exception for "out of memory".

    My transformation reads 52 fields in a table, where each line has 140bytes according to the Oracle static.

    Spoon and started with 1.6GB JVM.

    What can be done to avoid this kind of mistake?
    It would have a short way to what would be my need for JVM memory?
    Attached Images Attached Images  
    Attached Files Attached Files

  2. #2

    Default

    Hi,

    In your sample, why not ordering in the table input?

    Samatar
    Samatar

  3. #3
    Join Date
    Mar 2006
    Posts
    114

    Post

    Quote Originally Posted by shassan2 View Post
    Hi,

    In your sample, why not ordering in the table input?

    Samatar
    Well, actually I will not use "Table Input" for data entry, but "Text Input." This example only served to test.

    I also noticed that the step "Sort Rows" hesitate an option to "Only pass Unique Rows?", with that I do not need to step "Unique Rows."

    I'm doing a test to see if the error occurs again.

    regards,

    conca
    Attached Images Attached Images  

  4. #4
    Join Date
    May 2008
    Posts
    4

    Default

    leandroconcon,

    Did you ever solve this problem? I'm seeing something very similar using the Sort rows step and the "Only pass unique rows? option checked.

    Performing the sort within the input step is also not possible. I may try to perform this sort without the unique option checked and then pipe into the Unique rows step and see if this error goes away.

    Any thoughts would be appreciated.

    Thanks,
    Gary Brunton

  5. #5
    Join Date
    Mar 2006
    Posts
    114

    Talking

    Quote Originally Posted by Gary View Post
    leandroconcon,

    Did you ever solve this problem? I'm seeing something very similar using the Sort rows step and the "Only pass unique rows? option checked.

    Performing the sort within the input step is also not possible. I may try to perform this sort without the unique option checked and then pipe into the Unique rows step and see if this error goes away.

    Any thoughts would be appreciated.

    Thanks,
    Gary Brunton

    In version 3.2 RC1 I'm having no more such problems are working correctly.
    I have not done a test with many rows, but starting tomorrow will be the test with 600 million rows

  6. #6
    Join Date
    May 2008
    Posts
    4

    Default

    Using the combination of the sort rows step (without the unique option checked) followed by the unique rows step has fixed my problem. I'm perfectly fine with this extra step but am curious how things work for you using the new version.

    Thanks for your acknowledgment!

    Gary Brunton

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.