Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: XLS input error checking

  1. #1
    Join Date
    Sep 2008
    Posts
    26

    Default XLS input error checking

    Hi,

    I have written a transform that uses an two XLS input transforms for a single Excel file. Each transform uses one of the two sheets in the file and then I merge the data based on a common key.

    I tested the sheet with a typo in the sheet name and also with the sheet missing and the problem I am having is that the transform does not fail. It does not fail because there is simply no output from the XLS Input step where there is a missing sheet.

    Question is:

    is there a way to stop the transform either based on something in XLS input or by checking how many rows were in the output? In this case the output is 0 rows

    I tried various combination of grouping, filter rows, etc, but the problem is that there is no data outputted from the XLS input.

    I have set the error checking on the sheet to "ignore error" and "skip error" lines because of the makeup of the data. In content I have "no empty rows"

    Thanks
    Des
    Attached Images Attached Images  

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Copy the data from the xls to another stream... group by and the filter on - to an abort step... e.g.

    Regards,
    Sven

  3. #3
    Join Date
    Sep 2008
    Posts
    26

    Unhappy

    I tried the suggestion and a few others, like errors, logs, etc., but no luck. I think the problem is more fundamental........

    To illustrate the issue I created a test transform with a simple XLS sheet. (See attached)

    If the sheet name is "Sheet1" as expected the data is processed and the JS can check the name.

    If however the sheet name is changed to "Sheet2" the JS does not even execute and check the data.

    What this means is that there is no way to check the sheet name for data integrity! The same is true if the sheet is empty.

    Is there a way to do some integrity checking of the XLS Input?

    thanks
    Des
    Attached Files Attached Files

  4. #4
    Join Date
    Sep 2008
    Posts
    26

    Default

    Guys, I'd really appreciate some help on this issue when you have some time.

    If there is no way to check the data or sheet names in XLS Input, I cannot ensure the integrity of the data. I do not want to switch tools mid-stride here since I like Kettle, but I really need a solution.

    Could you please have a look at the attached files and see if there is a possible way to catch this error?

    Thanks
    Des

  5. #5
    Join Date
    May 2006
    Posts
    4,882

    Default

    In the current version (3.1) there's no easy way I see to differentiate between invalid sheet and just empty sheet.

    You can raise a JIRA, I do see the usecase for it... but that's not going to help you short short temr.

    Regards,
    Sven

  6. #6
    Join Date
    Sep 2008
    Posts
    26

    Default

    Sven

    The issue is more that the steps after the XLS Input are not processed if the XLS Input has zero resulting rows.

    The attached transform shows what I mean. If it was processed, then I could check for the condition.

    Is there a way to force the checking even if there are zero rows? In this way I could check and then branch.

    Thanks
    Des

  7. #7
    Join Date
    May 2006
    Posts
    4,882

    Default

    Use a Row generator to generate an extra line.... make a hop from the row generator to a dummy, make a link from the xls step to the same dummy ... got 1 row... you don't have real input... but granted it's an ugly way around it.

    The problem you have is that of course steps are only triggerd upon rows hitting them (the normal cases anyway), so no rows no actions.

    Regards,
    Sven

  8. #8
    Join Date
    Sep 2008
    Posts
    26

    Smile

    Thanks I'll give it a go. It is ugly, but I'd it triggers evaluation it's worth a go.

    Do you think this is an architecture flaw or was Kettle designed to work like this? Would there be value in adding error handling on sheets?

    Thanks
    Des

  9. #9
    Join Date
    Sep 2008
    Posts
    26

    Default

    ok, I tried it, and the generated row is always in the stream, which means the subsequent steps always get triggered. I put a Filter rows and Grouping to check if the original Sheet name count is bigger than 1, else abort.

    Would be good to consider a more robust error checking for sheet names. The file and data checking seems to be ok, but the sheet checking is not

    Thanks for the help
    Des

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.