Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: How to use PDI lookup step to extract data from factory display kit?

  1. #1

    Default How to use PDI lookup step to extract data from factory display kit?

    One of our factories has obtained OEE measuring equipment on a trial basis. For reference, the exact kit can be seen on this link:

    This comes with some decent report options but they require the user to log-on to a web interface which isn't always convenient. The manual states that "SQL access to the XL device is exposed via a web services API accessed using a key=value pair POST request to the path /". The manual continues to detail that the two parameters required are response-type (application/json or text/xml) and sql_statement (the SQL statement relating to the query to be processed).

    I intend to use PDI issue burst reports via email but first I need to work out how I can extract data using the above API from PDI.

    As a test I can call the following URL to retrieve a sample data file (I've blanked out our internal IP address): select sequence_number, start_time from timeline_stream order by sequence_number desc limit 5;

    Calling this from a web browser returns a file named which contains data in the following format:

    [ { 'data': [ [ '13212','3529744930.0' ],[ '13211','3529744706.0' ],[ '13210','3529744423.0' ],[ '13209','3529744300.0' ],[ '13208','3529743581.0' ] ],'error': '','statement_time': 0.00472999999999999980,'total_time': 0.00515900000000000040 } ]

    Therefore the returned file is in a JSON format with fields named 'data', 'error', 'statement_time' and 'total_time'.

    I am unable to work out how to configure an appropriate lookup step in PDI to get this data returned to a point where I can feed it into a JSON input step. I've tried the HTTP Post, HTTP Client, Web Services Lookup and REST Client but with no success. I don't have any experience in using web services so I don't know what I'm doing wrong.

    I assume I need to parameterise my request to the appropriate lookup step so that it can determine the data source, the application type and the sql statement to be processed. I've tried every way I can think of to parse this with the above mentioned steps but all I can achieve is a variety of error messages!

    The closest I can get is to put the following into the URL field of a Web Services lookup:;

    When I press the Load button against this URL I receive an error message which displays the JSON content listed above along with an error message stating "Content is not allowed in prolog".

    Is there anyone out there who can point me in the right direction with this please? I appreciate that more information may be required but I thought it best not to flood this post with lots of unnecessary detail if someone can work out how I should configure a PDI lookup step from the URL I've posted above.

    Thanks for looking.


    PS. Forgot to mention I'm testing this using PDI 4.2.0-RC1 at present.
    Last edited by riffmeister; 11-08-2011 at 09:34 AM.

  2. #2
    Join Date
    Sep 2009


    Hey Dan,

    if you'd be able to provide some URL I can test with, I'd provide a sample. Preferably some public one, so we can post the solution here as well


  3. #3
    Join Date
    Jun 2007


    I'd be very worried that you can supply a SQL query on the URL. That is a very dangerous feature. Nonetheless, if access is restricted to internal users, some risk is managed.

    You should be using the http client. Make sure you query that the status is 200. In the output fields of the Http Client step, supply fieldnames for status and result fields.

    Then query the status field. Anything other than 200 is an error. if status is 200, then the 'result' field has the JSON response.

  4. #4


    Hi guys

    Thanks very much for your responses. Since posting I actually managed to make some progress using the HTTP Post step.

    My first mistake was trying to parse the whole URL within the general tab, including the content of the response_type and sql_statement.

    Once I realised that I needed to include the key-value pairs within the Fields tab I subsequently got myself in another fine mess as I had misunderstood the use of the Name and Parameter fields.
    I had been trying to name the fields as 'response_type' and 'sql_statement' with the parameter column set to the values 'application/json' and 'select ...' respectively. I.e. I had mistakenly thought I could set the value of the fields using this tab.

    I've now broken the whole task down so that I set the value of the response_type and sql_statement variables using 'Set Variables' step in one transformation and then collect these using a 'Get Variables' step in a second transformation. I have a master job to control the hop from the first transformation to the second and doing this has enabled me to elicit a properly formatted JSON response from the OEE equipment, albeit the value currently being retrieved in the result variable is 'LESQL Syntax Error'!

    crafter - I appreciate your concern about placing a SQL query on the URL. According to the documentation provided the API only exposes a limited subset of SQL (LESQL) which renders the device database readonly. I have of course tested this using update statements within a browser URL, all of which have had no effect so this seems robust. If the kit is made live then it will be on an internal network behind a firewall so at this point I'm not overly concerned about it.

    On your other point I can confirm that the status code is 200 so this indicates that things are heading in the right direction. My problem now is that it appears that the variable values that are being passed to the second transformation are being truncated to 13 characters so I must be missing a setting somewhere. I'm going to do some more digging and will either post back with a working solution or another request for help!

    Thanks again.


  5. #5
    Join Date
    Apr 2008


    If you decide to post back, try posting all your KTR and KJB files for this setup in a single ZIP file.
    The other forum members are **REALLY** good at looking through files, and providing feedback on other ways to do the same thing, or improving what you've got.

  6. #6


    Me again!

    I've fixed the truncation error I mentioned in my earlier post but have merely succeeded in moving on to another problem. I've therefore done as gutlez has suggested and exported my job and transformations to a Zip file which I've attached to this post in the hope that someone in the Pentaho world can help.

    The response I receive with my latest attempt is 'unrecognized token' instead of the JSON format result-set I need. I believe that the problem is related to how I should format the sql_statement I'm sending to the kit but I've run out of ideas as to how I can resolve this.

    Please note that the text file output steps in each transformation are there purely to help with de-bugging and will be removed once I manage to successfully retrieve a response from the factory kit via the sql statement. As you may see by looking at the example oeexl.txt file attached I can get a response from the API exposed by this kit but it doesn't like the statement I'm sending.

    Please note that the sql_statement being used in this example works if I access the URL via a web browser so I'm as sure as I can be the syntax is correct.

    Any help will be much appreciated.



Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.