Hitachi Vantara Pentaho Community Forums
Results 1 to 12 of 12

Thread: Join Multiple Streams of Data

  1. #1
    Join Date
    Mar 2014
    Posts
    7

    Default Join Multiple Streams of Data

    New to pentaho, I am calculating multiple metrics in this job by filtering data to multiple streams.

    I have validated individual stream the calculations are working fine.

    Now I want to load them to target database, tried using Multiway join wasn't sure if that is the right component but it's not yielding any records.

    Please suggest appropriate steps to achieve this. I have enclosed the kettle file here.

    Thanks!!
    Attached Files Attached Files

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    If you need to upload to database, why not using Table Output step?
    -- Mick --

  3. #3
    Join Date
    Mar 2014
    Posts
    7

    Default

    Dim_CatgFaultSummary is table output step

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    All your "Filter rows" steps should use the "main" output hop instead of a "true" hop - either "true" and "false" hops are both attached or "main" only.

    Name:  wronghop.png
Views: 63
Size:  13.0 KB
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Mar 2014
    Posts
    7

    Default

    I filter true records for count logic in "Group By" Step, what is the impact here if I change this to main output down the line will multi-way join get fixed.

    Can you please explain logic behind "main" here

  6. #6
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Here is the explanation again:

    Quote Originally Posted by marabu View Post
    either "true" and "false" hops are both attached or "main" only.
    You should have got a warning from Spoon, though!

    If Multi Merge Join doesn't work, try multiple Merge Joins instead.
    I remember MMJ to behave erratically sometimes, but never felt inclined to find the bug.
    So long, and thanks for all the fish.

  7. #7
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    I think a change in logic is called for here...

    Instead of repeating all the groups, and then doing the totals, why not total each fault type (for each AtmKey and FaultTime), and then flatten it down to the rows...

    Also, if you can supply us with a sample dataset (it doesn't have to be real data, just real-looking), we can help you clean this up a bit.
    **THIS IS A SIGNATURE - IT GETS POSTED ON (ALMOST) EVERY POST**
    I'm no expert.
    Take my comments at your own risk.

    PDI user since PDI 3.1
    PDI on Windows 7 & Linux

    Please keep in mind (and this may not apply to this thread):
    No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.
    Signature Updated: 2014-06-30

  8. #8
    Join Date
    Mar 2014
    Posts
    7

    Default

    Here's sample data for one ATM Key, not sure if you are talking about creating arrays here and then flattening later.

    Not used arrays before for data processing, let me know efficient steps to achieve it I will try it out also an approach to load the data in destination.
    Attached Files Attached Files

  9. #9
    Join Date
    Mar 2014
    Posts
    7

    Default

    the data is being filtered for true only

  10. #10
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    I don't usually do this... But based on your supplied file, try something like this...
    Attached Files Attached Files

  11. #11
    Join Date
    Mar 2014
    Posts
    7

    Default

    Thanks !! works very well

  12. #12
    Join Date
    Mar 2014
    Posts
    7

    Default

    Apart from grouping here on field "FaultCategory" I have to group on "Fault Desc" as well which will go to separate table.

    The trivial part here is the structure of table is different than last.

    I have following target structure:

    ATMKEY FAULTTIME CompName SubCompName CompFaultCnt SubCompFaultCnt



    If I follow the same approach I have to specify the exact same text as it appears in FaultDesc, is there a way i can do LIKE expression available
    in Filter "%journal%" and it gives me a grouped count of all fault desc having this text ?

    Also If I specify the exact text, I get a column called JournalPrinterFaultCnt but how would I map this to my target structure

    CompName -- JournalPrinter
    CompFaultCnt -- JournalPrinterFaultCnt

    Not sure if I follow the same approach and then use normalize step down the line, but it's not working as expected I have enclosed the update kettle file.

    I am also enclosing the old approach I was following creating different stream and then grouping them together but it leaves me at the mapping part as well.

    Please suggest
    Attached Files Attached Files

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.