Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Excel Input Step

  1. #1

    Default Excel Input Step

    I have a need to provide business analysts an easy means by which to specify data to be loaded into tables. Given that they are already familiar with MS Excel, I figured that this would be an appropriate tool.

    Each Excel workbook contains a single sheet which corresponds to a table. Column header names correspond to table column names.

    While the 'Excel Input' step provides a means by which to load data from MS Excel, it appears that the field names need to be defined at build time. Is there a way to have these names determined based of column name at runtime? If not, it appears that I will have to create a separate transformation for loading of each table. Are there some alternatives ?

    Thanks.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    No alternatives ... and from experience allowing people to enter their data in Excel and uploading that is a maintenance nightmare.

    Regards,
    Sven

  3. #3
    Join Date
    Mar 2006
    Posts
    26

    Lightbulb

    Quote Originally Posted by virafbankwalla View Post
    I have a need to provide business analysts an easy means by which to specify data to be loaded into tables. Given that they are already familiar with MS Excel, I figured that this would be an appropriate tool.

    Each Excel workbook contains a single sheet which corresponds to a table. Column header names correspond to table column names.

    While the 'Excel Input' step provides a means by which to load data from MS Excel, it appears that the field names need to be defined at build time. Is there a way to have these names determined based of column name at runtime? If not, it appears that I will have to create a separate transformation for loading of each table. Are there some alternatives ?

    Thanks.
    Extending virafbankwalla Question

    Is it possible that the columns on a file can come in any order. Example Let's say we have an Excel File with DATE, NAME, VALUEA

    And we create a ETL process that uses that and later someone adds a new Columns to your input File DATE, NAME, DESCRIPTION, VALUEA

    this makes the ETL process to crach and the ETL needs to be Updated.

    Is it possible tha Once we define DATE, NAME, VALUEA and we are saying that Excel has header row to determine the COLUMN to read based on the header name?

    Thanks

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I have to side with Sven again. If you really have to read XLS files, lock the structure down so that users can't mess about with it anymore.
    If you allow anything to be placed in the XLS file, then the next problem will not be the column order, but the names, the data types, etc.
    Again, speaking from experience as well, the nightmare just won't end until you define exactly what is allowed in the XLS, how it should look and you also verify this during load.

    That was one of the reasons why I created the "Data Validator" step this week: http://wiki.pentaho.org/display/EAI/Data+Validator

    Matt

  5. #5
    Join Date
    Mar 2006
    Posts
    26

    Smile

    Quote Originally Posted by MattCasters View Post
    I have to side with Sven again. If you really have to read XLS files, lock the structure down so that users can't mess about with it anymore.
    If you allow anything to be placed in the XLS file, then the next problem will not be the column order, but the names, the data types, etc.
    Again, speaking from experience as well, the nightmare just won't end until you define exactly what is allowed in the XLS, how it should look and you also verify this during load.

    That was one of the reasons why I created the "Data Validator" step this week: http://wiki.pentaho.org/display/EAI/Data+Validator

    Matt
    Excellent Matt. The validates was an excellent idea. I will check it out a bit more. because one of my main concerns is that if a file comes in with a change in the Column order or additional data in the middle, then the Kettle process will try to process it and give errors on the Number fields vs Text fields etc etc. So it is good to send a File structure mismatch error.

    I will check your validator however for what you wrote on the Wiki. it will solve some of the cases... others will fail directly on the Excel Input step.

    mmmm...!!!! maybe a File structure validator...... that would cause a lot of coding for each tipe of input... so well as I said. Let me check it out....

    Thanks Matt

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Then again Alexander,

    I'm strongly in favor of the idea of data ownership. If someone f***s up the XLS file, it's their responsibility. It should end right there. Just ship it back to where it came from and let them fix it.

    Don't ever be tempted to try and fix these things. It's one of the pitfalls that can cause a data warehouse projects and even complete BI projects to fail.

    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.