Hitachi Vantara Pentaho Community Forums
Results 1 to 22 of 22

Thread: Importing Excel 2007/2010 Files. How?

  1. #1
    Join Date
    Oct 2008
    Posts
    23

    Default Importing Excel 2007/2010 Files. How?

    Hi Forum,

    at the moment apparently Kettle (4.0.0) only supports file types up to Office 2000. That is leading us to a problem as the company is switching to 2007 (xlsx, xlsm).

    Is anybody facing similar problems and developed a workaround? Any suggestions?

    Can somebody tell when Kettle will be updated regarding this topic?
    (Or maybe does the Enterprise Edition already?)

    Many thanks in advance!
    Roman

  2. #2
    Join Date
    Dec 2009
    Posts
    609

    Default

    Hi Roman,

    check out JIRA: http://jira.pentaho.com/browse/PDI-2793

    Seems like XLSX will be supported in PDI 4.1.0

    Best regards,

    Tom

  3. #3
    Join Date
    Oct 2008
    Posts
    23

    Default

    Hi Tom!

    Thanks a lot for your help! :-)

    Roman

  4. #4
    Join Date
    Mar 2010
    Posts
    28

    Default

    We are also facing similar issues. One workaround to this may be to save .xlsx to .xls (backward compatible format using save as option) file and then use the .xls file. Not sure whether this will be helpful in your case.

    kettle supporting it would be really helpful.

  5. #5
    Join Date
    Dec 2009
    Posts
    609

    Default

    Hi again

    I just had a quick google-look and found this one:
    http://www.zdnet.de/windows_tabellen...5-248109-1.htm

    According to the description, it has "batch-capabilities" for converting XLSX to XLS...
    I think, calling this program before using a PDI Transformation and then accessing the newly converted XLS File via PDI might be an option (dirty workaround,
    but who cares if it works until 4.1.0 comes up? )

    HTH,

    Tom

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Perhaps, since xlsx is an XML file, it should be possible to read the content directly as XML.

  7. #7
    Join Date
    Oct 2008
    Posts
    23

    Default

    @Matt:
    Matt, thanks for your reply! That´s a good idea... I had something similar in mind... I will try this and let you know about the outcome.
    Can you confirm that this feature will be available in 4.1.0? Or is it just planned?

    @Tom:
    Thanks for the hint! Looks interesting on the first glimpse... will take this into consideration!

    Well, the files have to be in the new format... so using the compatibility mode provided by office is not an option. The only thing I can think of is a makro in the xlsx-files which makes a "save as xls" in the background on close of the worksheet...

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I'm working on an abstraction layer for spreadsheet files in 4.1.0 so that we can support xls, xls and odt (and what else comes next) with the same UI.

  9. #9
    Join Date
    Oct 2008
    Posts
    23

    Default

    Sounds good :-)

    Until then: My intended solution (for other who might find this post and have a similar problem):

    With the Compatibility Pack of Microsoft Office 2007 comes everything you need. In my case (xlsm to xls) you use excelcnv.exe for conversion... Luckily this is batch capable - so you can use it in command line:
    C:\> excelcnv.exe -nme -oice "C:\testtest.xlsx" "C:\testoutput2.xls"
    Now you just have to wirte a scheduled batch job which keeps the xlsx files mirrored to a folder in xls-format. Then Kettle can access these files. Voila tout! :-)
    There are similar files in the Program Files\Microsoft Office\Office12 for word and other file types...

    Thanks and Greetings,
    Roman

  10. #10
    Join Date
    Nov 2008
    Posts
    777

    Default

    Quote Originally Posted by MattCasters View Post
    Perhaps, since xlsx is an XML file, it should be possible to read the content directly as XML.
    I believe an xlsx file is actually an archive file (zip file) with several xml files in it.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  11. #11
    Join Date
    Mar 2010
    Posts
    159

    Default

    Yeah that's what I thought also... And Excel's XML might be terrifying

  12. #12
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    If it's a zip file, Kettle can access the xml with the VFS driver. Something like this:

    Code:
    zip:file:///home/bar/file.xlsx!sometab.xml

  13. #13
    Join Date
    Sep 2009
    Posts
    810

    Default Sample Transformation

    Hi all,

    I provided a short sample transformation that might help to get started. It reads a simple list of top10 movies from an xlsx file
    Unzip the xlsx file and check out the files when looking at the XPATH expressions. It will make things easier to understand.
    Pay special attention to the fact that Excel "interns" strings, making them shared values stored in another xml file. The sample shows how to look them up.

    Since I am not really familiar with the guarantees the Excel XML format makes I'd suggest to find other formats to read the data, as long as there's no official xlsx support. But if you want to experiment around, go ahead

    Cheers

    Slawo
    Attached Files Attached Files

  14. #14
    Join Date
    Oct 2008
    Posts
    23

    Default

    Hi Slawo,

    thanks for the example :-) Pretty nice actually... however in my case the files are just to complicated for that approach (hard to explain why).

    Thanks!
    Roman

  15. #15

    Default

    There also command-line based tools that use OpenOffice.org on any platform to convert documents before reading them.

    http://www.artofsolving.com/opensource/jodconverter
    http://dag.wieers.com/home-made/unoconv/

    You should be able to use those in a Kettle job to convert your original document into an XLS.

  16. #16
    Join Date
    Oct 2008
    Posts
    23

    Default

    balazsb

    @balazsb
    Thanks for the hint - however it looks like they´re just support xls (and doc) but not xlsx (Excel 2007) or docx...

  17. #17

    Default

    Hi!
    I tried both and they successfully converted an XLSX to XLS. They support whatever file formats your OpenOffice.org installation supports and recent OOo versions can read XLSX quite well.

  18. #18
    Join Date
    Oct 2008
    Posts
    23

    Default

    Ah, okay! Sounds good! I´ll give it a try... Worked a little bit with the tools from Office compatibility pack - however I keep having problems regarding the OLE-Filetype...
    But yours sound promising :-)
    Thanks a lot!

    Roman

    PS: Do you have a sample or something you could share? :-D

  19. #19

    Default

    Hello,

    I've installed both programs on Debian Linux from the normal repositories.

    The command lines are easy:

    jodconverter infile.xlsx outfile.xls

    (The file formats are deduced from the file extension)

    unoconv -v -d spreadsheet -f xls infile.xlsx

    (Unoconv by default creates a PDF from the file, so you must tell it to created an XLS using "-f xls")

    Both need a headless OpenOffice.org instance in the background which you can start with:

    soffice -headless -accept="socket,port=8100;urp;"

    See the documentation of the utilites.

  20. #20
    Join Date
    Oct 2008
    Posts
    23

    Default

    Gosh! :-) Many thanks... you probably saved me a lot of time :-)

  21. #21
    Join Date
    Jul 2011
    Posts
    1

    Default

    Quote Originally Posted by RomanGermany View Post
    Sounds good :-)
    Until then: My intended solution (for other who might find this post and have a similar problem):
    With the Compatibility Pack of Microsoft Office 2007 comes everything you need. In my case (xlsm to xls) you use excelcnv.exe for conversion... Luckily this is batch capable - so you can use it in command line:
    C:\> excelcnv.exe -nme -oice "C:\testtest.xlsx" "C:\testoutput2.xls"
    I realize this post is nearly a year old, but hopefully you'll get notified of a response and maybe can help. :-) Were you able to successfully convert an xlsx file to xls? I have recently been trying the exact command you have above. While an xls file is output, it seems like it's really just a renamed version of the xlsx one. The new file does not open in older versions of Excel, and it seems as if the conversion didn't really take place. The compatibility pack works fine through the front end - if I just open the file through Excel, it does the conversion. It is just through the command line that I am having issues. Any help/insight you can provide would be greatly appreciated. Thanks!

  22. #22
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    FYI, Kettle 4.1 or later can read XLSX formats directly without any need for conversion.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.