Hitachi Vantara Pentaho Community Forums
Results 1 to 25 of 25

Thread: Variable number of columns in Excel input

  1. #1
    Join Date
    Feb 2007
    Posts
    230

    Default Variable number of columns in Excel input

    Hi,

    I have the following problem: I am downloading an Excel file in a job and the run a transformation that should populate the database. The number of columns/fields in Excel input is not known in advance. The Excel table is pivot table so I need to normalize it but I can't if I don't know the number (and the names) of columns. BTW, the columns are dates like 2.1.2008, 3.1.2008 (d.m.yyyy format)

    Any help?

    thanks very much

    regards, dejan
    Last edited by dgambin; 09-18-2008 at 04:36 AM.

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I don't think you can use Kettle for this. A transformation needs to be deterministic in nature.
    You could think about writing a tool that analyses the XLS, determines the layout, columns etc and then generates the matching Kettle transformation.
    However, we're talking about Java code in that case.

    Matt

  3. #3
    Join Date
    Feb 2007
    Posts
    230

    Default

    Thanks Matt,

    If you know for any tool/code that could do something like this, please let me know.

    Unfortunatelly I am not familiar with Java programming and I need to automate the whole process, from downloading the file to insert/update MySQL database.

    thanks very much

    regards, dejan

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Well, since the data is pivoted and doesn't have a specific order of occurrence, I would think that there isn't a tool out there that can do it.

  5. #5
    Join Date
    Feb 2007
    Posts
    230

    Default

    Thanks Matt,

    If you just can spend a minute and take a look on the example data, I would be very grateful :-)

    Specifically,it is the euribor rate historical data, updated every day (except weekends and holidays) with columns that represent dates and the values in rows. Except this, everything is pretty ordered and clear so maybe there is hope :-)

    The example is in the attachment

    thanks very much
    regards, dejan
    Attached Files Attached Files
    Last edited by dgambin; 09-18-2008 at 02:57 PM.

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Well, suppose you would read all data in as Strings, you might be able to create something.
    The trick will be to apply the first row of data in a data set to all the following rows.

  7. #7
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Just for fun (!) I tried to create some JavaScript that would be able to parse the 366 columns.
    I generated some code from a shell script to save me from typing.
    In the end this is what I got:

    Code:
    Couldn't compile javascript: Program too complex: too big jump offset
    So perhaps it's best to attack this differently. Personally I think it's going to be a tough one.

    Matt

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    OK, so I found some sort of a solution anyway...

    Euribor parser.ktr

    Name:  euribor.jpg
Views: 505
Size:  13.6 KB

    It involved some creative XLS copy/paste use to populate the dialogs.

  9. #9
    Join Date
    Feb 2007
    Posts
    230

    Default

    This looks so great and so complicated to me :-) Thanks very much on this effort. I am getting an error in last step (Convert to numeric) like:

    2008/09/19 09:38:06 - Convert to numeric.0 - ERROR (version 3.0.0, build 500 from 2007/11/14 14:59:11) : Unexpected error :
    2008/09/19 09:38:06 - Convert to numeric.0 - ERROR (version 3.0.0, build 500 from 2007/11/14 14:59:11) : org.pentaho.di.core.exception.KettleValueException:
    2008/09/19 09:38:06 - Convert to numeric.0 - ERROR (version 3.0.0, build 500 from 2007/11/14 14:59:11) : data String : couldn't convert String to number
    2008/09/19 09:38:06 - Convert to numeric.0 - ERROR (version 3.0.0, build 500 from 2007/11/14 14:59:11) : Unparseable number: " 4,124"

    If you can see, the number is "4,124" (not 4.124) because I modified the first step and set the decimal to "," and grouping to "." (comma is decimal separator here). But even without this modification I get the same error (only 4.124 instead of 4,124) :-(

    regards, dejan

  10. #10
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Version 3.0.0 is too old, update to 3.0.4.
    If you change the first step, you should also change the last step.
    That is where the data is converted to Number.

  11. #11
    Join Date
    Feb 2007
    Posts
    230

    Default

    I have installed 3.0.4 but still have this issue. I have modified the Excel input and "Convert to numeric" step (comma for decimal, dot for grouping) but I still get this error.

    I have noticed the string that needs to be converted has a blank in front of it (" 4,124") but I don't know why. I have tried to use trim option in Excel input but it didn't help. This "blank" is present at the very beginning (Filter rows step). Can it be the reason for conversion error?

    What is the most interesting - I have added the last step to insert/update the MySQL table and it worked in a certain moment but the values were wrong (4.000). After that I can't get it work in anyway :-(

    regards, dejan

  12. #12
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Since the XLS is mixing data types (Dates and numbers) you want STRINGS from excel, not numbers. You don't care how they are formatted.
    Just convert them to numbers in the last Select Values step.
    The transformation I attached works without change in version 3.0.4.

    Matt

  13. #13
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    And I just checked, the mask is " #.#", not "#.#" in my transformation.

  14. #14
    Join Date
    Feb 2007
    Posts
    230

    Default

    Oh yes, that is where I made a mistake once I modified the last step and the "#.#" mask. I didn't see the blank before the "#.#". Now it is ok.

    Thank you very much, I have the data in MySQL table and this is great :-)

    best regards, dejan

  15. #15
    Join Date
    Apr 2007
    Posts
    8

    Default Variable number of columns in Excel input

    Hello,

    I think maybe i'm having the same problem like dgambin.
    I make a transformation for my excel file but now I need to excecute this transformation for more table.

    I need to know if the other table have the same layout how can I proceed?

    And on the other hand if this table have some difference between them like, one of them doesn't have one field that the other have. How can I proceed?

    To MattCasters : you were talking about "tool that analyses the XLS, determines the layout, columns etc and then generates the matching Kettle transformation.However, we're talking about Java code in that case." i need to know more about it. thank you very much.

    regards,
    ani05

  16. #16
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    There's not much to tell Ani. You need to have some understanding of the Excel file you're reading. If you don't have any understanding you could possibly write Java code that analyses the spreadsheet and generates the PDI transformation for you dynamically.

    Matt


  17. #17
    Join Date
    Apr 2007
    Posts
    8

    Default

    Thank you Matt,

    I need to know one more thing. Let suppose that I'm going to modify this tables, so I will determine the same layout. My answer is: How can I execute the same transformation to all of them. I think I might create a job but I don't know how. thank you very mutch!

    best regards
    ani

    Appelloluglio2007.xls

    Voti_BDE_estrazione.ktr

    Name:  Immagine.jpg
Views: 257
Size:  14.7 KB

    Quote Originally Posted by MattCasters View Post
    There's not much to tell Ani. You need to have some understanding of the Excel file you're reading. If you don't have any understanding you could possibly write Java code that analyses the spreadsheet and generates the PDI transformation for you dynamically.


    Matt

  18. #18
    Join Date
    Apr 2007
    Posts
    8

    Default

    Hello Matt,

    maybe I didn't explain very good what i really have needed. anyway now all my table in excel have the same number of columns and the same name of fileds. I want to know how is possible to write a Java code and recall the PDI transformation that I created before. can you explain it in a few words? Thank you

    ani

    Quote Originally Posted by MattCasters View Post
    There's not much to tell Ani. You need to have some understanding of the Excel file you're reading. If you don't have any understanding you could possibly write Java code that analyses the spreadsheet and generates the PDI transformation for you dynamically.


    Matt

  19. #19

    Default

    Hi ani05,

    if I understood well you want to reuse the same transformation for many .xls files having the same column layout, right? if yes, one solution can be to use the steps "Copy rows to result" and "Get rows from result". You should put "Get rows from result" as input step for the transformation you want to reuse (let's call it transformation_out); then you have to create one transformation for each .xls file, each containing just one .xls file as input and "Copy rows to result" as output (let's call it transformationN, where N changes with each .xls input file). Then create a job that executes the transformations like this:

    START --> transformation1 --> transformation_out --> transformation2 --> transformation_out --> transformation3 --> transformation_out etc.

    Otherwise, just put all of the .xls files as input in transformation_out pointing to a "Do nothing" step. You'll have like a "union" this way. but I don't know what's your aim, so I'm not sure if it's what you need.

    HTH,

    Iuri

  20. #20
    Join Date
    Apr 2007
    Posts
    8

    Default

    Thank you very much Iuri,

    my purpose is to get in input a variable number of tables, let suppose 6 tables and then after applying the transformation I showed before to all of these table, in output I want to have the same number of tables(6 tables). the only thing that i need is to normalise these tables. maybe the right way is to use the first solution you suggested to me.

    regards,
    ani

    Quote Originally Posted by biuri View Post
    Hi ani05,

    if I understood well you want to reuse the same transformation for many .xls files having the same column layout, right? if yes, one solution can be to use the steps "Copy rows to result" and "Get rows from result". You should put "Get rows from result" as input step for the transformation you want to reuse (let's call it transformation_out); then you have to create one transformation for each .xls file, each containing just one .xls file as input and "Copy rows to result" as output (let's call it transformationN, where N changes with each .xls input file). Then create a job that executes the transformations like this:

    START --> transformation1 --> transformation_out --> transformation2 --> transformation_out --> transformation3 --> transformation_out etc.

    Otherwise, just put all of the .xls files as input in transformation_out pointing to a "Do nothing" step. You'll have like a "union" this way. but I don't know what's your aim, so I'm not sure if it's what you need.

    HTH,

    Iuri

  21. #21
    Join Date
    Apr 2007
    Posts
    8

    Default

    hi,

    I want to know one more thing. My answer is: if there is any way to create an interface for the kettle transformation or how can I call a transformation from a programming language (like JAVA)?

    Thank you

    ani

  22. #22
    Join Date
    Nov 1999
    Posts
    9,729

  23. #23
    Join Date
    Apr 2007
    Posts
    8

    Default

    hi,

    I'm using the Java code to execute my transformation and it works. My transformation get an excel file in input. Now I want to pass to my program in java one by one a list of excel file. For example if I have 6 file in input, the program should get this file one by one then recall the transformation, so it passes each file to the same transformation and in the end of this transformation it should put the new file in a new place. They have the same layout. I really don't know very well how to do this in Java. Thank you.

    Ani
    Attached Files Attached Files

  24. #24
    Join Date
    Apr 2007
    Posts
    8

    Default

    Sorry Mr.Casters,

    maybe I'm bothering you. I have a last question. I'm having some excel file that have the same name of the header but the order of the columns is not the same. I can explain it better using an example:
    Lets have 2 table in excel with the respective name of columns

    Tab1:
    Column1
    Column2
    Column3

    Tab2:
    Column3
    Column1
    Column2

    when I recall the transformation function i only give in input the name of the table. So let suppose I insert Tab1 and then I specify the name of the sheet and in the fields tab, in Get fields from header row I should specify every time the order of fields of the respective Excel files. I can't do it every time. I read about variables but I didn't understand the use of them in my case. Could you give me a simple explanation. I will atach my ktr file.I want to apologize because my english isn't very good.

    Regardless
    Ani
    Attached Files Attached Files

  25. #25
    Join Date
    Aug 2008
    Posts
    563

    Default Tutorial based on this example available

    As the original question was quite interesting, I thought it would a good idea to base a tutorial on it. You can find it here.
    Best regards,
    Diddy

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.