Hitachi Vantara Pentaho Community Forums
Results 1 to 15 of 15

Thread: Check if file is not empty

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Jun 2016
    Posts
    181

    Default Check if file is not empty

    How to check if file is not empty? According the docs step "Detect empty stream" should do the job. But how to use it?

    "This step will output one row if input stream is empty (ie when input stream does not contain any row). The output row will have the same fields as the input row, but all field values will be empty (null)."


    It will produce row with empty fields but when input is not empty... it will produce empty result. It does not work.

    In the following example when infput file is empty the result with be just header (as suppose to work).
    When input file is not empty will will produce nothing. It should simply copy the input.

    Is there any simple way to check if file is empty?
    Attached Files Attached Files

  2. #2
    Join Date
    Feb 2016
    Posts
    22

    Default

    In your example, you're distributing and not copying the rows. I believe that's why you're not getting the result you're expecting (please check the hops in the example that is available in the documentation).

    There is a component named "Get Files Row Count", I personally never used it but you can give it a try to see if that's what you need.

    Hope this helps

  3. #3
    Join Date
    Jun 2016
    Posts
    181

    Default

    How to set them in "copy" state?
    I tried "Get Files Row Count" but received error all the time - I do not know how to use it.

  4. #4
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by Gosforth View Post
    I tried "Get Files Row Count" but received error all the time - I do not know how to use it.
    What did you try?
    Simply putting the file name in the box shows the number of rows in that file...

  5. #5
    Join Date
    Jun 2016
    Posts
    181

    Default

    Quote Originally Posted by gutlez View Post
    What did you try?
    Simply putting the file name in the box shows the number of rows in that file...
    if the file has only one row (no headers), "rowscount" will return... 0 (should be 1).
    I do not see any option to make it works properly. There is some?

  6. #6
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Not that I would have expected this...
    It's not an easy one...

    When I hover over the "Perform smart count" it tells me that the step will count the number of line separators by default. If you check the "Perform Smart Count" it is supposed to try to something different. That might fix the issue, but it might hurt performance.

  7. #7
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Actually, you don't even need to do that.
    Just run it directly through the detect empty stream.

    When I ran your sample (I had to adapt it to use portable filenames...), it did exactly what is in the description... It created a row with F1 set to null. Otherwise, it wouldn't even create row 1.

    But I don't think that it's going to do what you want it to do as configured, since there's a good chance that if you send it a file with one row that Row 1 will go to the file output step, and then the "Detect Empty Rows" will detect that it got no input, so it will send one row, resulting in TWO rows going to the output, one with data in Row 1, and one with null in Row 2. This can be avoided by running it directly through the "Detect Empty Rows" step.

    On changing the row movement... Right Click on the "Text File Input" step, go to Data Movement, and select "Copy" rather than round robin. BUT... That's DEFINITELY not what you want in this case.

  8. #8
    Join Date
    Jun 2016
    Posts
    181

    Default

    Quote Originally Posted by gutlez View Post
    Actually, you don't even need to do that.
    Just run it directly through the detect empty stream.

    When I ran your sample (I had to adapt it to use portable filenames...), it did exactly what is in the description... It created a row with F1 set to null. Otherwise, it wouldn't even create row 1.

    But I don't think that it's going to do what you want it to do as configured, since there's a good chance that if you send it a file with one row that Row 1 will go to the file output step, and then the "Detect Empty Rows" will detect that it got no input, so it will send one row, resulting in TWO rows going to the output, one with data in Row 1, and one with null in Row 2. This can be avoided by running it directly through the "Detect Empty Rows" step.

    On changing the row movement... Right Click on the "Text File Input" step, go to Data Movement, and select "Copy" rather than round robin. BUT... That's DEFINITELY not what you want in this case.
    I do not get it at all. I would expect that "Detect empty stream" steam is used to... detect empty stream.
    So, I input file sends nothing, take some action. If input file has the row, take other action.

    Forwarding stream or forwarding empty stream gives no value; I still have problem to manage - I still have to check if stream is empty and take some action.

    Sorry, this seems useless component for me :-)

    The only value is that it generates "null" so easier to check with "Filter rows".... instead of writing some regex.

    Anyway, thanks a lot for help!

  9. #9
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by Gosforth View Post
    I do not get it at all. I would expect that "Detect empty stream" steam is used to... detect empty stream.
    So, I input file sends nothing, take some action. If input file has the row, take other action.

    Forwarding stream or forwarding empty stream gives no value; I still have problem to manage - I still have to check if stream is empty and take some action.

    Sorry, this seems useless component for me :-)

    The only value is that it generates "null" so easier to check with "Filter rows".... instead of writing some regex.
    If you read the help on the step, that's exactly what it's supposed to do... If a stream has data, take no action. If the stream has data, generate a compliant row of all nulls, and send that.

    Imagine a scenario where you have a table input doing incremental updates, and then you want to trigger a procedure to refresh your indexes. But you want the indexes updated regardless of if there is new data. That's the use-case for this step.

  10. #10
    Join Date
    Feb 2017
    Posts
    13

    Default

    You are "distributing" the data, change it to "copy" it should work.

  11. #11
    Join Date
    Jun 2016
    Posts
    181

    Default

    Quote Originally Posted by rsdp View Post
    You are "distributing" the data, change it to "copy" it should work.
    This will have no influence at all. Please read my previous answer about that.

  12. #12
    Join Date
    Feb 2017
    Posts
    13

    Default

    Quote Originally Posted by Gosforth View Post
    This will have no influence at all. Please read my previous answer about that.
    content ---> uncheck header
    Attached Images Attached Images  
    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.