US and Worldwide: +1 (866) 660-7555
+ Reply to Thread
Results 1 to 9 of 9

Thread: Read multiple Excel Files without header

  1. #1
    Join Date
    Jan 2010
    Posts
    10

    Default Read multiple Excel Files without header

    Hello !

    I just starting to use Kettle and I want to automate the transfer of 250 excelfiles into a mysql database but I have multiple problems in doing it.

    I try several days to accomplish this task.

    First I read the colums amd rows, but because there is a macro behind it (the colums represent strings and decimals (integers)). First question what rises how can I transform them when it is a F,V, or a A without transforming the decimals (integers)?

    When reading the colums and rows with header on I have to fill out the empty spaces between the colums. Even when I start a new sequence on line 2 (in the tab list of sheets to read). Can I bypass this ?

    I want to read in a day planning of a employee. This works fine accept on the point mentioned above. Anoother question is There is a Column in which the name and department is listed. How can I get these fields in a row so I can get them in a mysql database. Where to look ??

    I will upload the excel sheet (Tab basisbestand) so everone can see what I mean. Next year I will make a website on which every employee can fill in his palnning but this is a very nice test to see how ETL works. (WIth hopefully some help of this forum)

    Thanks in advance.

    Jannes..
    Attached Files

  2. #2
    Join Date
    Sep 2007
    Posts
    625

    Default

    Hi, Jannes, please see my comments below,
    mc

    Quote Originally Posted by Warlock View Post
    First question what rises how can I transform them when it is a F,V, or a A without transforming the decimals (integers)?
    you should read the columns as String. Then, check the value and change the metadata to integer if the value is not F,V or A
    Quote Originally Posted by Warlock View Post
    When reading the colums and rows with header on I have to fill out the empty spaces between the colums. Even when I start a new sequence on line 2 (in the tab list of sheets to read). Can I bypass this ?
    Do you mean skiping the columns where you have no data? No, you can't, unless you read the sheet with more than one Excel input step.
    Quote Originally Posted by Warlock View Post
    There is a Column in which the name and department is listed. How can I get these fields in a row so I can get them in a mysql database. Where to look ??
    If you know exactly the (row, column) where each value is, you can use the Start row / Start column values in the Sheet tab. If not, maybe you have a rule for discovering where the data is. In that case the procedure will depend on the rule.

    good luck!

  3. #3
    Join Date
    Jan 2010
    Posts
    10

    Unhappy

    Maria, Thank you for your quick reply.

    Quote Originally Posted by Maria Roldan View Post
    Hi, Jannes, please see my comments below,
    mc


    you should read the columns as String. Then, check the value and change the metadata to integer if the value is not F,V or A
    Ok, I have got this done with an Null if and it works fine ....

    Do you mean skiping the columns where you have no data? No, you can't, unless you read the sheet with more than one Excel input step.
    Ok, I have more than one (excel) input steps. But what then to do ?
    I have 2 tables one with one entry and one with 52 entries (weeks of the year). I need one Table. I have tried to use clone rows and then merge them together but this seems not to work.
    ANy suggestions where to look ??

    If you know exactly the (row, column) where each value is, you can use the Start row / Start column values in the Sheet tab. If not, maybe you have a rule for discovering where the data is. In that case the procedure will depend on the rule.
    Ok, I have done this and I got the data. Now I want to put them together in one table. How ? See my last question.....

    Thanks in advance....

  4. #4
    Join Date
    Nov 2008
    Posts
    114

    Lightbulb This is what I would do

    Quote Originally Posted by Warlock View Post
    I have 2 tables one with one entry and one with 52 entries (weeks of the year). I need one Table. I have tried to use clone rows and then merge them together but this seems not to work.
    ANy suggestions where to look ??

    Ok, I have done this and I got the data. Now I want to put them together in one table. How ? See my last question.....

    Thanks in advance....
    This is what I would do: keep the two tables separate! It is an order of normalization of the database which will save storage space and, most importantly, prevent you from having to duplicate the employee's name in every record. The tables can then be joined with a query at runtime.

    The first table would have two columns:

    PK EmployeeID INT
    EmployeeName TINYTEXT

    and the second table would reference the employee by ID and contain the columns for the week data for each year:

    PK EmployeeID INT
    PK YearNumber INT
    PK WeekNumber INT
    MondayHours DECIMAL(10,1) [you could use negative numbers to flag the F, V, A conditions]
    TuesdayHours DECIMAL(10,1)
    ... (for the rest of the days of the week)

    Also, the EmployeeID in the second table MUST be declared as a foreign key that references the EmployeeID of the first table. That will keep the two tables in sync (referential integrity). You should set the foreign key to 'Cascade On Delete' so if an employee name is deleted, their entire calendar will be deleted as well. No orphans, please.

    When you want to retrieve the data for a particular person for a particular year, you can join the two tables into one table with a SQL Join query:

    SELECT * FROM table_1 t1
    JOIN table_2 t2 USING (EmployeeID)
    WHERE EmployeeName = ? AND YearNumber = ?
    ORDER BY WeekNumber;
    Last edited by darrell.nelson; 01-19-2010 at 07:42 PM.
    pdi-ce-3.2.0-stable
    bi-server-3.5.0.stable
    MySQL 5.1
    Windows XP

  5. #5
    Join Date
    Jan 2010
    Posts
    10

    Default

    Hello Darell!

    You have absolutely right, but thats not the problem I have. I the output I have made there is no unique key and this I want to add. In this case the name is the best shot I had. So I have a table with data of planning which could be from everyone and I have an Excel input with the unique name and these I have to merge... But how ???

    Do I add a dummy column and copy the excel data (which is in fact one name) into the table with the 52 rows of weekdata ??

    Thanks,

    Jannes

  6. #6
    Join Date
    Jan 2010
    Posts
    10

    Talking Solved !

    Solved!

    Quite simple if you know. Added Join rows (Cart. product) and it worked!
    For the newbies : Look in the sample directory!!!

    I will try to make a job that automates the input of 250 excel forms!

    Everyone thanks for the help.

    Jannes

  7. #7
    Join Date
    Nov 2008
    Posts
    114

    Thumbs up

    Quote Originally Posted by Warlock View Post
    ...there is no unique key and this I want to add. In this case the name is the best shot I had.
    Quote Originally Posted by Warlock View Post
    Added Join rows (Cart. product) and it worked!

    That will work but proceed with caution. For each person, you have now duplicated their name for every week in the database. A potential problem is that names change from time-to-time. The storage space is increased as well since each name is stored 51 more times for each year in the database.

    To generate the unique ID you were looking for, you could insert the first table with a Table Output step set to return back the auto-generated primary key (that field must be set up as 'auto increment' in the database). You then use the returned key in the Table Output step for the second table.

    The other option is to initially query the first table for the highest EmployeeID and then increment it for each spreadsheet you read. Using that method you would not have to use the auto increment feature in the database (works very well with MySQL but may not work if you switch to another database platform) or capture the auto-generated value when writing the first table.

    Good Luck! I've captured data from thousands of spreadsheets with Kettle and it works very well.
    pdi-ce-3.2.0-stable
    bi-server-3.5.0.stable
    MySQL 5.1
    Windows XP

  8. #8
    Join Date
    Jan 2010
    Posts
    10

    Default

    You're right. I will implement this after I solved the problem of syncing my 3 inputsteps with each other. I have now the problem that the 'employeename' is not retrieved form the same inputfile as the departmentdata nor the planning data.

    I have to use accept filenames of the previous step, but this is generating errors.

    Jannes

  9. #9
    Join Date
    Nov 2008
    Posts
    114

    Default

    I understand the problem you are facing. The "Excel input" step is not designed to selectively read certain cells or ranges of cells. It is basically designed to read columns of data with an optional header row.

    In cases like yours, I have actually gone to using a shell script (Windows Powershell) to extract the cells I want and then write them to xml files. Then I use Kettle to read the xml files using xpath expressions in the "Get data from XML" step.
    Last edited by darrell.nelson; 01-21-2010 at 07:19 PM.
    pdi-ce-3.2.0-stable
    bi-server-3.5.0.stable
    MySQL 5.1
    Windows XP

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts