Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: add new sequence for each row by group

  1. #1

    Default add new sequence for each row by group

    Hello guys,

    i'm reading a file like this:

    dvv;rtp;description
    1;1;blabla
    1;2;ciccio
    2;8;amamamama
    2;9;..........
    3;15;...................


    i have to generate a db sequence for each group of dvv like this:

    sequence;dvv;rtp;description
    10024;1;1;blabla
    10024;1;2;ciccio
    10025;2;8;amamamama
    10025;2;9;..........
    10026;3;15;...................

    the file is really big and i whant to read it once .

    I would like to generate sequence while dvv change value and assign new value for each row of group.

    I needs somethings like "Add value fields changing sequence" step but i have my how database sequence to get not only a counter.

    I needs an help tanks in advance.
    Last edited by mbesozzi@gmail.com; 03-18-2016 at 05:23 AM.

  2. #2
    Join Date
    Aug 2011
    Posts
    360

    Default

    Quote Originally Posted by mbesozzi@gmail.com View Post
    Hello guys,

    i'm reading a file like this:

    dvv;rtp;description
    1;1;blabla
    1;2;ciccio
    2;8;amamamama
    2;9;..........
    3;15;...................


    i have to generate a db sequence for each group of dvv like this:

    sequence;dvv;rtp;description
    10024;1;1;blabla
    10024;1;2;ciccio
    10025;2;8;amamamama
    10025;2;9;..........
    10026;3;15;...................

    the file is really big and i whant to read it once .

    I would like to generate sequence while dvv change value and assign new value for each row of group.

    I needs somethings like "Add value fields changing sequence" step but i have my how database sequence to get not only a counter.

    I needs an help tanks in advance.
    Hi,

    I think you'll need to copy your stream in a new one, keep only the dvv field, then sort and unique. You'll have a stream of distinct dvv. On it, use add sequence with the db sequence option. Then you'll have to merge join this stream to the original one on the dvv key to get back the sequence field on each row.

    Another solution, could be to build a table with fields dvv;sequence (where sequence value is auto generate by your DB), then use a dimension lookup step which will lookup the value of sequence field based on dvv key, or insert the new dvv if it dont exist then retrieve the generate sequence field. But this might be slowlyer then first solution, it all depends on size of the stream and if there is much new dvv keys to insert.

    OR in the same idea, always with the table (named MAP) dvv;sequence, do that:
    JOB :
    Transfo 1: read your file and load it into a table.
    SQL SCRIPT: insert all the dvv values that are not present in the MAP table (insert into MAP select distinct dvv from table left outer join MAP on MAP.dvv = table.dvv WHERE MAP.dvv is null ).
    Transfo 2: use a table input and join MAP with your table to get all sequence in one shote: SELECT t.*, MAP.sequence FROM table join MAP on MAP.dvv = table.dvv
    Last edited by Mathias.CH; 03-18-2016 at 06:49 AM.

  3. #3

    Default

    Thanks for replay bu the file contains from 1.500.000 to 3.000.000 rows for 100 fields ,I needs something efficent.

    I have tried with calling subtransformation with row grouping dvv , when subtransformation starts i generate a db sequence , but the first row of second groups have the same sequence of firts !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.

    It's a bug???
    Last edited by mbesozzi@gmail.com; 03-18-2016 at 08:13 AM.

  4. #4
    Join Date
    May 2014
    Posts
    358

    Default

    What about simply fetching the latest ID from a database, storing it in a variable in one transformation, then starting your value fields changing sequencefrom this number?

  5. #5

    Default

    it's an idea but I have a centralized table for sequence and i have to use that one , and encrease real time the max number into table.
    li ke oracle sequence.

  6. #6
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    1) If you think something is a bug, *ALWAYS* include which version of the software you are using - it may really be a bug, and may have been solved already. ( http://jira.pentaho.com is your friend here!) Perhaps you've come across bug http://jira.pentaho.com/browse/PDI-14958

    2) Copy the rows into two parallel streams. Pass one to a unique rows step, giving you just your dvv number. Now pass that to the add sequence step (which can get its sequence number from the DB Sequence), now you can feed that into a stream lookup.
    Last edited by gutlez; 03-18-2016 at 12:06 PM.

  7. #7

    Default

    Quote Originally Posted by gutlez View Post
    1) If you think something is a bug, *ALWAYS* include which version of the software you are using - it may really be a bug, and may have been solved already. ( http://jira.pentaho.com is your friend here!) Perhaps you've come across bug http://jira.pentaho.com/browse/PDI-14958

    2) Copy the rows into two parallel streams. Pass one to a unique rows step, giving you just your dvv number. Now pass that to the add sequence step (which can get its sequence number from the DB Sequence), now you can feed that into a stream lookup.
    Thanks a lot gutlez.

    Tomorrow i try the unic row strategy.

    I'm using 6.0.1 GA version i opened a bug on jira http://jira.pentaho.com/browse/PDI-15121 and closed now(duplicated.

    Bye.

  8. #8
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I think this would run quite smoothly

    PS: Don't forget to switch to your database sequence generator in step Add-sequence ...
    Attached Files Attached Files
    Last edited by marabu; 03-18-2016 at 01:55 PM.
    So long, and thanks for all the fish.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.