Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Getting Array elements from MongoDB

  1. #1
    Join Date
    Jun 2011
    Posts
    20

    Arrow Getting Array elements from MongoDB

    Hi All,

    I want to migrate data from mongodb to a relational database.

    In mongo, we have a json with keys, arrays, sub documents and arrays in subdocument.

    We are trying to flatten the structure to a csv format for each array, each subdocument etc., My immediate requirement is to convert each array object
    to a separate record.

    {
    department : _id
    depart_name : string
    employee : array
    employee_id : number
    employee_name: string
    phonenumber : string
    }


    I would like to create two files one containing the following fields

    department_id,
    depart_name

    and another with the following structure

    department_id
    employee_array_id
    employee_id
    employee_name
    phonenumber

    The first file is pretty straightforward. Can you suggest the best way to extract all the element objects of an array into a separate flat record?
    We can assume the json structure is fixed and is uniform across all documents.

    Sample json format attached.


    PDI 4.4 in linux version,

    MongoDB 2.5.4,

    Vertica 6.x.
    Attached Files Attached Files
    siva.

  2. #2

    Default

    How about...

    Extract the array (everything inside the []) into it's own field using either the Regex step or using Regex in Javascript in the Javascript step. Then use the "Split Field to Rows" step to break the field into a row for each employee, using the comma as a delimiter.

    Good luck!

    Mark

  3. #3
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    It's the right idea, but regex and scripting makes me cringe
    Attached Files Attached Files
    So long, and thanks for all the fish.

  4. #4
    Join Date
    Jun 2011
    Posts
    20

    Default

    Hi Marabu,

    Thanks a lot for your very insightful and quick reply.
    I have gone throug the Ktr. Really good stuff.

    Lets say we have to retrive data from one more level.
    I would like to create two more files with the below structure from the attached JSON structure.
    Please guide.

    Structure 1:

    Department_id,
    Employee_id,
    Employee_array_id,
    Phone_array_id,
    Telephone

    Structure 2 :

    Department_id,
    Employee_id,
    Employee_array_id,
    Phone_array_id,
    MobileArray_id,
    Office,
    Personal


    Thanks in advance.
    Attached Files Attached Files
    siva.

  5. #5
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    It's always the same technique:


    • Chained JSON Input steps allow you to process nested arrays
    • Forked JSON Input steps will overcome missing values


    Name:  demo-2.png
Views: 801
Size:  15.9 KB
    So long, and thanks for all the fish.

  6. #6

    Default

    Nice.

    I've run into performance issues with the json input step so I've had to resort to regex. And yes, it makes me cringe too! (And have the scars to show for it.) I'm anxious to upgrade to 5.0 to get the better parsing in the MongoDB input step. That won't give everything needed by marabu's example, but will probably take care of the first step.

  7. #7
    Join Date
    Jun 2011
    Posts
    20

    Default

    First of all, thank you very much for the reply, Marabu!

    We have tried the same approach using nested Json, But we are facing issues when giving "path" for level2 array elements.

    we have tried with paths "$.employees[*].phone[*].mobile[].office" and "$.employees[*].phone[*].mobile[].personal" for getting office and personal fields
    respectively. But this syntax is not working.

    Could you please suggest me the best way for giving "path".
    Also what can be done, if we need to go one level deeper. Meaning, another sub-array in office/personal.

    Thanks in Advance!
    siva.

  8. #8
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by siva View Post
    We have tried the same approach using nested Json
    Apparently not - you wouldn't have level 2 arrays after proper chaining.

    Using multiple wildcards in your path expressions seems to be a bad idea.
    There are special requirements your input must meet to qualify for such expressions.
    The error messages should tell you.

    Stick with these types of JSONPath expressions:

    For field processing:
    $.fieldname
    $.[*].fieldname

    For array processing:
    $.arrayname[]
    $.[*].arrayname[]
    So long, and thanks for all the fish.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.