Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Multi level REST requests + nested JSONs

  1. #1
    Join Date
    Mar 2017
    Posts
    1

    Question Multi level REST requests + nested JSONs

    Hi Guys,

    First post and totally new in PDI so be gently if my questions are stupid and sorry for the long text

    I working for a travel agency and in the last years we got special offers via mail with Excel Attachements.
    In the last two years more and more our partners offering APIs to get those updates/offers via REST in JSON form.
    Since the last year we use a simple System of UIPATH and Powershell scripts + SSIS to get those data and transform from JSO for the MS SQL load. The solution is stable and worked fine over around an year. Since six weeks ago our primary partner announced a changed of there data distribution to REST and JSON like most of the others. We take a look at the preview api and unfortunately the necessary REST steps and the JSON data structure are not so easy as most the others our aPOS and the JSONS answers are heavily nested. So our actually solution reachs the limit of functionality.
    I checked in the last two weeks the ETL market to evaluate a open source solution that can handle the complete process of multi level REST requests and extract/transform heavily nested JSON files.
    Actually my preffered solution would be PDI since it is easy to handle and offers the necessary Connections & transformations.

    I checked the tutorials and the samples and tried to build the necessary Jobs/Transformation in PDI but runs in several Problems.

    Necessary logical process steps:
    1. SEND Post Request to the API
    2. LOOP the POST Request to the API to get the complete list (one request only deliver 400 entries, the total list have around 8000 entries)
    3. Extract from the JSON answers the Travel IDs
    4. Send GET request to the API with the extracted Travel ID as URL Segment to get the details
    5. LOOP the GET requests to get the Travel Details
    6. Extract the informations from the GET Request JSON answers transform it
    7. load the data into our MS SQL database

    Step 1,3,7 are done and working.

    Step 2: in UIPATH we use an array of API adresses that we defines for each runs. I looked for an similar solution @ PDI but found no Array option or Loop function. So i checked the forum and a suggestion was to use the generate row function and add the complete API URL as value and set the adress in the REST Client from the URL field Name. My problem here: i can only select one URL field Name but in the row Generation each row have to get an unique name. How i can definies there that each row will be checked?

    Step 4: I tested the GET (a fixed one) request with the Rest Client. That works but to get there API pathes dynamicly i searched for an URL Segment function like in UI Path. So i think i have to build the API links with the Travel ids and load it completly (like step2) to the rest Client. I actually found no way to do that. Any tips?

    Step 5: So far the same question as step 2: How i can loop rest requests?

    Step 6: I build an json Input from an answer from step 4. That works for the first Level on the json fine. I tried to directly extracted the deeper Level and got an error the answers have to be the same number for each json path (Logical answer since i tried to extract json objects and arrays in one step). I found the json nested sample where the solution for nested json is to connect two json Inputs and extract the first Level in the json Input 1 and extract the json Arrays in connected json Input 2. i tried that but if i connect the json Input 1 with the json Input 2 i got an unexpected error in Spoon: null
    java.lang.NullPointerException
    at org.pentaho.di.ui.spoon.trans.TransGraph.addHop(TransGraph.java:1610)
    at org.pentaho.di.ui.spoon.trans.TransGraph$8.widgetSelected(TransGraph.java:1563)
    at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1359)
    at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7990)
    at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9290)
    at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:685)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92)

    Any tips?

    Thanks for your Feedback.

    Regards
    Timo

  2. #2
    Join Date
    Aug 2011
    Posts
    236

    Default

    Hi Timo,

    That's a big ask! I can give you some more pointers.

    1) With a good API, when there is more results than the max allowed in one request, there is usually and page/record pointer that allows you to setup subsequent call give the start point of the next set of results. I would look at the API documentation.
    2) If you only need one field from the calls (travelid), use JSON Input to extract that field and 'Copy rows to results'.

    In the next transformation, set the advanced properties to 'process for each input row' and parameter (tab) to the field you copied to results. In this transformation , form the URL needed using the parameter passed and call the URL via GET. Extract the fields you need and write to db.

    For the JSON Input steps, make sure you check the 'Ignore missing path' box.

    You need to be careful about the looping because sometimes it can cause javaheap errors. You might have to have staging file/tables to store results/ids. I don't know how complicated your situation is. Here's the basic logic :-

    1) Get first POST results
    2) Get travelid(s) from results
    3) For each travelid, get details (via url), parse JSON, store results to db
    4) Get next POST results using paging parameter
    5) If not finished, go to 2)

    HTH
    Last edited by tnewman; 03-08-2017 at 05:27 PM.
    PDI 8.0.0
    MySQL - 5.6.27
    Redshift - 1.0.1485
    PostgreSQL 8.0.2
    OS - Ubuntu 10.04.2

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.