Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: HTML XLS input

  1. #1

    Default HTML XLS input

    Hi all,

    I've got a problem with importing an Oracle output file. This file shows me the inventory per x dimensions. Oracle outputs it as an html file with xls extension. When opening it with Excel it also says it's not excel format. When I try to load this into Spoon it gives me error below. Is there a way to have pentaho convert it to xls before using it? Or is there another way to not have to manually save the file as something before inputting it into spoon?


    2016/12/29 09:55:30 - Microsoft Excel Input.0 - ERROR (version 7.0.0.0-25, build 1 from 2016-11-05 15.35.36 by buildguy) : Error processing row from Excel file [C:\Users\ddejong\Downloads\LBTSOHQRPT_140392924_1.xls] : org.pentaho.di.core.exception.KettleException:
    2016/12/29 09:55:30 - Microsoft Excel Input.0 - jxl.read.biff.BiffException: Unable to recognize OLE stream
    2016/12/29 09:55:30 - Microsoft Excel Input.0 - Unable to recognize OLE stream
    2016/12/29 09:55:30 - Microsoft Excel Input.0 - ERROR (version 7.0.0.0-25, build 1 from 2016-11-05 15.35.36 by buildguy) : org.pentaho.di.core.exception.KettleException:
    2016/12/29 09:55:30 - Microsoft Excel Input.0 - jxl.read.biff.BiffException: Unable to recognize OLE stream
    2016/12/29 09:55:30 - Microsoft Excel Input.0 - Unable to recognize OLE stream

  2. #2
    Join Date
    Apr 2016
    Posts
    156

    Default

    Not aware of any pre-built step right off the bat that will do what you're looking for in an easy fashion.

    There's potential to use multiple PDI steps to address your problem:
    1. convert HTML-style output into XML (e.g. use plugin https://github.com/mattyb149/pdi-html-to-xml-plugin)
    2. read XML using built-in XML steps (will need to pre-configure the stuff like fields you're looking for)

    There's potential to use outside scripting to address your problem:
    1. use external script (language based on your host OS) to convert the file to CSV / standard XLS / XLSX
    2. run the script in a PDI job based on grabbing input, then feed the result to downstream PDI transformations

    There's potential to eliminate the bottleneck at source: can work with source data provider to choose an alternative data transfer method (may be as simple as requesting / teaching how to get export in CSV).
    My runtime environment: MacOS, JDK 1.8u121, PDI 7.0

  3. #3
    Join Date
    Apr 2012
    Posts
    253

    Default

    Could output multiple CSV files. Probably the easiest solution. Could setup a Mondrian server as an intermediary, should be able to pull from oracle to produce the cube then import to PDI.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.