Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Log error if Sheet/Tab is missing in the input excel file

  1. #1

    Default Log error if Sheet/Tab is missing in the input excel file

    Hi All,

    How can I log error in a text file if in case my input excel file does not contain the required Sheet/Tab in it?
    I want to log error in a text file with a error message for example "SheetName is missing in your input excel file" or in Generic way "Some Sheets/tabs are missing in your input file". I am using excel input step to read the input file in Pentaho 7.1 community edition.

    Please help

    Thanks
    Ajinkya
    Last edited by Ajinkya; 09-24-2018 at 12:26 AM.

  2. #2
    Join Date
    Aug 2016
    Posts
    290

    Default

    If you want clean log messages like you describe here, you must abandon the entire log system of kettle because that will always produce large log chunks where you are not in complete control of what is written. You can do this by disabling standard kettle logs (or set to error only), then write logs manually yourself.

    If you want to log error based on missing tab/sheet in an excel file, you must first find a way to identify and list names of sheets/tabs in said excel file. After listing names, it's a simple matter of filtering and using "detect empty stream" to trigger a new row when the tab/sheet doesn't exist. Then send that to a step which produces the log message you want and finally write that to a specified log file.

    Unfortunately, I have no idea how to identify or list tabs/sheets in excel. Perhaps you could make an initial dummy transformation to read the excel file, and depending on how it is implemented, it may trigger an error if the tab/sheet specified doesn't exist?

  3. #3
    Join Date
    Nov 2009
    Posts
    688

    Default

    To make a list of sheetnames in Excel
    - Take Excel input. In Sheet tab leave Sheet name empty and in start row / start column value 0
    - In Fields tab create one record.
    - in additional output fields ad sheetname in the Sheetname field
    - in a select value step you can remove the column created in Field tab of Excel input
    Then you have all the sheetnames in the Excel file

  4. #4

    Default

    Quote Originally Posted by johanhammink View Post
    To make a list of sheetnames in Excel
    - Take Excel input. In Sheet tab leave Sheet name empty and in start row / start column value 0
    - In Fields tab create one record.
    - in additional output fields ad sheetname in the Sheetname field
    - in a select value step you can remove the column created in Field tab of Excel input
    Then you have all the sheetnames in the Excel file

    Thanks for your help.
    I tried above steps but it takes too long to execute around 3 minutes.
    I have large data in excel file. I am using excel input step in POI Streaming mode.
    As we have large data in 4 sheets in excel file, input step reads all data. Hence it takes too long to execute.
    Can we read only sheet names which will be faster? or any other faster way to do this?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.