Hitachi Vantara Pentaho Community Forums
Results 1 to 11 of 11

Thread: JSON input with nested structure with same property name at different levels

  1. #1

    Default JSON input with nested structure with same property name at different levels

    Hi All
    I'm wondering if I found a bug in parsing JSON data. Here is my sample
    Code:
    [
        {
            "productId": 329,
            "name": "Multi Image Enhanced",
            "productGroupId": 65,
            "productGroupName": "Prints",
            "SKU": "110000",
            "SKU_Category": "SHEETPRODUCTS",
            "finishingPackageId": 315,
            "packageTypeGlobalId": "2306a5bd-1008-4241-a114-74c5020d9d48",
            "finishingPackageGlobalId": "5ebaa4c2-cc6b-4f95-a358-81957106b313",
            "stylePackages": [{
                "stylePackageId": 2727,
                "stylePackageGlobalId": "91cc7186-a927-45c0-8552-0fa2c97cdfbf",
                "name": "duet black"
            },
            {
                "stylePackageId": 2728,
                "stylePackageGlobalId": "d2d91a44-3496-4891-825e-ed94cc73bd68",
                "name": "duet white"
            }]
        },
        {
            "productId": 338,
            "name": "Multi Image Enhanced",
            "productGroupId": 65,
            "productGroupName": "Prints",
            "SKU": "80000",
            "SKU_Category": "SHEETPRODUCTS",
            "finishingPackageId": 324,
            "packageTypeGlobalId": "2306a5bd-1008-4241-a114-74c5020d9d48",
            "finishingPackageGlobalId": "e6a4b2c3-b676-4ef0-9edb-155851c6bced",
            "stylePackages": [{
                "stylePackageId": 2727,
                "stylePackageGlobalId": "91cc7186-a927-45c0-8552-0fa2c97cdfbf",
                "name": "duet black"
            },
            {
                "stylePackageId": 2728,
                "stylePackageGlobalId": "d2d91a44-3496-4891-825e-ed94cc73bd68",
                "name": "duet white"
            },
            {
                "stylePackageId": 2729,
                "stylePackageGlobalId": "31bf8147-c44d-4804-a45b-c1ff80ff999a",
                "name": "slated black"
            },
            {
                "stylePackageId": 2730,
                "stylePackageGlobalId": "e8468e13-9ded-496b-a927-c34d38044e0b",
                "name": "slated white"
            }]
        }]
    It has a nested structure and one of the propertiees is named "name"
    Here is how I'm reading it and I get error when trying to preview the data. Once "productName" line is removed evrything is back to normal. I know PDI does not hanlde nested JSON data, but I thought if $.. is specified the parser would know which level "name" it is.
    Name:  Json.jpg
Views: 1622
Size:  28.8 KB
    Any help would be appreciated.

    Thanks

  2. #2

    Default

    Hi

    Perhaps this does not necessarily qualify it as a bug. The JSON INPUT in PDI requires that JSON fields be uniquely named. I have managed to recreate your problem and understood it. The only way to fix this is to rename, at the source file, the second field "name" to product_name.

    See sample attached herein.

    Good luck and have fun!
    Happy Christmas!

    ingrinberg_json.tar.gz

  3. #3
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by lgrinberg View Post
    but I thought if $.. is specified the parser would know which level "name" it is.
    According to JSONPath, $..name means search the entire JSON and find any entry that matches.
    So $..name matches for $[*].name and for $[*].stylepackages[*].name and so now you have an inconsistent number of values to put into a field.

    I strongly recommend that you NOT use $.. in your JSONPath, as it will cause errors (as you demonstrated), but also because it will be slower in giving results.

  4. #4

    Default

    Thank you for reply. I thought about it more and was reading all JSON specs. I ran it throw multiple JSON validators and they all came back as a valid JSON structure. I now think it is a bug, because the field names are at different levels and I should be able to access values accordingly. It would be same comparison to XML structure where different elements may have the same property names at different level.

    I don't know where you found a PDI requirement for field names in JSON input have to be unique. Could you point me to it?

    Quote Originally Posted by dellux View Post
    Hi

    Perhaps this does not necessarily qualify it as a bug. The JSON INPUT in PDI requires that JSON fields be uniquely named. I have managed to recreate your problem and understood it. The only way to fix this is to rename, at the source file, the second field "name" to product_name.

    See sample attached herein.

    Good luck and have fun!
    Happy Christmas!

    ingrinberg_json.tar.gz

  5. #5

    Default

    I thought that "$" means root level and every "." means next level down. What should my code look than? Thanks

    Quote Originally Posted by gutlez View Post
    According to JSONPath, $..name means search the entire JSON and find any entry that matches.
    So $..name matches for $[*].name and for $[*].stylepackages[*].name and so now you have an inconsistent number of values to put into a field.

    I strongly recommend that you NOT use $.. in your JSONPath, as it will cause errors (as you demonstrated), but also because it will be slower in giving results.

  6. #6
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    The data structure is not the same inside the resource! We found 8 values for json path [$.[*].name], which is different that the number returned for path [$.*.productId] (2 values). We MUST have the same number of values for all paths.
    From http://goessner.net/articles/JsonPath/
    .. recursive descent. JSONPath borrows this syntax from E4X.

    I can process your JSON with two JSON Input steps chained together.
    1st one is as follows (you should be able to copy this and paste it in the first JSON input fields tab):
    Code:
    Name Path Type Format Length Precision Currency Decimal Group Trim type Repeat
    ProductID $.*.productId String       none Y
    name $.*.name String        Y
    ProdGroup $.*.productGroupId String        Y
    ProdGroupName $.*.productGroupName String        Y
    SKU $.*.SKU String        Y
    SKUCat $.*.SKU_Category String        Y
    FinPackID $.*.finishingPackageId String        Y
    PackTypeGID $.*.packageTypeGlobalId String        Y
    FinPackGID $.*.finishingPackageGlobalId String        Y
    StylePack $.*.stylePackages String
    StylePack now contains another valid JSON, which you can process.

    My final output:
    Code:
    ProductID name ProdGroup ProdGroupName SKU SKUCat FinPackID PackTypeGID FinPackGID StylePackID StylePackGID StyleName
    329 Multi Image Enhanced 65 Prints 110000 SHEETPRODUCTS 315 2306a5bd-1008-4241-a114-74c5020d9d48 5ebaa4c2-cc6b-4f95-a358-81957106b313 2727 91cc7186-a927-45c0-8552-0fa2c97cdfbf duet black
    329 Multi Image Enhanced 65 Prints 110000 SHEETPRODUCTS 315 2306a5bd-1008-4241-a114-74c5020d9d48 5ebaa4c2-cc6b-4f95-a358-81957106b313 2728 d2d91a44-3496-4891-825e-ed94cc73bd68 duet white
    338 Multi Image Enhanced 65 Prints 80000 SHEETPRODUCTS 324 2306a5bd-1008-4241-a114-74c5020d9d48 e6a4b2c3-b676-4ef0-9edb-155851c6bced 2727 91cc7186-a927-45c0-8552-0fa2c97cdfbf duet black
    338 Multi Image Enhanced 65 Prints 80000 SHEETPRODUCTS 324 2306a5bd-1008-4241-a114-74c5020d9d48 e6a4b2c3-b676-4ef0-9edb-155851c6bced 2728 d2d91a44-3496-4891-825e-ed94cc73bd68 duet white
    338 Multi Image Enhanced 65 Prints 80000 SHEETPRODUCTS 324 2306a5bd-1008-4241-a114-74c5020d9d48 e6a4b2c3-b676-4ef0-9edb-155851c6bced 2729 31bf8147-c44d-4804-a45b-c1ff80ff999a slated black
    338 Multi Image Enhanced 65 Prints 80000 SHEETPRODUCTS 324 2306a5bd-1008-4241-a114-74c5020d9d48 e6a4b2c3-b676-4ef0-9edb-155851c6bced 2730 e8468e13-9ded-496b-a927-c34d38044e0b slated white
    Last edited by gutlez; 12-24-2015 at 01:38 PM.

  7. #7

    Default

    Thank you. That worked. I was on the same path to get styles on the second step.


    Quote Originally Posted by gutlez View Post
    From http://goessner.net/articles/JsonPath/
    .. recursive descent. JSONPath borrows this syntax from E4X.

    I can process your JSON with two JSON Input steps chained together.
    1st one is as follows (you should be able to copy this and paste it in the first JSON input fields tab):
    Code:
    Name Path Type Format Length Precision Currency Decimal Group Trim type Repeat
    ProductID $.*.productId String       none Y
    name $.*.name String        Y
    ProdGroup $.*.productGroupId String        Y
    ProdGroupName $.*.productGroupName String        Y
    SKU $.*.SKU String        Y
    SKUCat $.*.SKU_Category String        Y
    FinPackID $.*.finishingPackageId String        Y
    PackTypeGID $.*.packageTypeGlobalId String        Y
    FinPackGID $.*.finishingPackageGlobalId String        Y
    StylePack $.*.stylePackages String
    StylePack now contains another valid JSON, which you can process.

    My final output:
    Code:
    ProductID name ProdGroup ProdGroupName SKU SKUCat FinPackID PackTypeGID FinPackGID StylePackID StylePackGID StyleName
    329 Multi Image Enhanced 65 Prints 110000 SHEETPRODUCTS 315 2306a5bd-1008-4241-a114-74c5020d9d48 5ebaa4c2-cc6b-4f95-a358-81957106b313 2727 91cc7186-a927-45c0-8552-0fa2c97cdfbf duet black
    329 Multi Image Enhanced 65 Prints 110000 SHEETPRODUCTS 315 2306a5bd-1008-4241-a114-74c5020d9d48 5ebaa4c2-cc6b-4f95-a358-81957106b313 2728 d2d91a44-3496-4891-825e-ed94cc73bd68 duet white
    338 Multi Image Enhanced 65 Prints 80000 SHEETPRODUCTS 324 2306a5bd-1008-4241-a114-74c5020d9d48 e6a4b2c3-b676-4ef0-9edb-155851c6bced 2727 91cc7186-a927-45c0-8552-0fa2c97cdfbf duet black
    338 Multi Image Enhanced 65 Prints 80000 SHEETPRODUCTS 324 2306a5bd-1008-4241-a114-74c5020d9d48 e6a4b2c3-b676-4ef0-9edb-155851c6bced 2728 d2d91a44-3496-4891-825e-ed94cc73bd68 duet white
    338 Multi Image Enhanced 65 Prints 80000 SHEETPRODUCTS 324 2306a5bd-1008-4241-a114-74c5020d9d48 e6a4b2c3-b676-4ef0-9edb-155851c6bced 2729 31bf8147-c44d-4804-a45b-c1ff80ff999a slated black
    338 Multi Image Enhanced 65 Prints 80000 SHEETPRODUCTS 324 2306a5bd-1008-4241-a114-74c5020d9d48 e6a4b2c3-b676-4ef0-9edb-155851c6bced 2730 e8468e13-9ded-496b-a927-c34d38044e0b slated white
    Last edited by lgrinberg; 12-24-2015 at 02:44 PM.

  8. #8

    Default

    Quote Originally Posted by lgrinberg View Post
    Thank you. That worked. I was on the same path to get styles on the second step.
    can u share the transformation, what do u mean to chain the two json input

  9. #9
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    I suppose that he proposes to use 2 Json Input Steps one after the other?
    -- Mick --

  10. #10
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by lakshmi.bhogadi View Post
    what do u mean to chain the two json input
    Feed the output of one JSON Input step into another JSON Input step as its input.

  11. #11
    Join Date
    Jul 2016
    Posts
    17

    Thumbs up Simple and elegant

    Works great, is simple and effective. Thanks !

    Quote Originally Posted by gutlez View Post
    From http://goessner.net/articles/JsonPath/
    .. recursive descent. JSONPath borrows this syntax from E4X.

    I can process your JSON with two JSON Input steps chained together.
    1st one is as follows (you should be able to copy this and paste it in the first JSON input fields tab):
    Code:
    Name Path Type Format Length Precision Currency Decimal Group Trim type Repeat
    ProductID $.*.productId String       none Y
    name $.*.name String        Y
    ProdGroup $.*.productGroupId String        Y
    ProdGroupName $.*.productGroupName String        Y
    SKU $.*.SKU String        Y
    SKUCat $.*.SKU_Category String        Y
    FinPackID $.*.finishingPackageId String        Y
    PackTypeGID $.*.packageTypeGlobalId String        Y
    FinPackGID $.*.finishingPackageGlobalId String        Y
    StylePack $.*.stylePackages String
    StylePack now contains another valid JSON, which you can process.

    My final output:
    Code:
    ProductID name ProdGroup ProdGroupName SKU SKUCat FinPackID PackTypeGID FinPackGID StylePackID StylePackGID StyleName
    329 Multi Image Enhanced 65 Prints 110000 SHEETPRODUCTS 315 2306a5bd-1008-4241-a114-74c5020d9d48 5ebaa4c2-cc6b-4f95-a358-81957106b313 2727 91cc7186-a927-45c0-8552-0fa2c97cdfbf duet black
    329 Multi Image Enhanced 65 Prints 110000 SHEETPRODUCTS 315 2306a5bd-1008-4241-a114-74c5020d9d48 5ebaa4c2-cc6b-4f95-a358-81957106b313 2728 d2d91a44-3496-4891-825e-ed94cc73bd68 duet white
    338 Multi Image Enhanced 65 Prints 80000 SHEETPRODUCTS 324 2306a5bd-1008-4241-a114-74c5020d9d48 e6a4b2c3-b676-4ef0-9edb-155851c6bced 2727 91cc7186-a927-45c0-8552-0fa2c97cdfbf duet black
    338 Multi Image Enhanced 65 Prints 80000 SHEETPRODUCTS 324 2306a5bd-1008-4241-a114-74c5020d9d48 e6a4b2c3-b676-4ef0-9edb-155851c6bced 2728 d2d91a44-3496-4891-825e-ed94cc73bd68 duet white
    338 Multi Image Enhanced 65 Prints 80000 SHEETPRODUCTS 324 2306a5bd-1008-4241-a114-74c5020d9d48 e6a4b2c3-b676-4ef0-9edb-155851c6bced 2729 31bf8147-c44d-4804-a45b-c1ff80ff999a slated black
    338 Multi Image Enhanced 65 Prints 80000 SHEETPRODUCTS 324 2306a5bd-1008-4241-a114-74c5020d9d48 e6a4b2c3-b676-4ef0-9edb-155851c6bced 2730 e8468e13-9ded-496b-a927-c34d38044e0b slated white

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.