Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Converting xml to csv

  1. #1
    Join Date
    Mar 2014
    Posts
    16

    Default Converting xml to csv

    I am trying to convert a xml to csv file and couldn't able to do create the csv file properly. I tried with XML Input Stream.
    Its parsing the file but the output is not what I expected to be.

    I was expecting to have header row and all the appropriate values delimited with the delimiter.

    In the XML Input Stream step I checked the below 2 items.

    1. XML Data name
    2. XML Data Value

    The file generated is having the above 2 fields as Header and the xml tags / values as individual rows in that file.
    I need to have the xml elements as the Header and its value appropriately.


    My use case will be receiving JMS Messages and need to convert it into csv file before loading to database.
    Inserting/Updating JMS messages to database works fine. But I am trying to put these xml messages converted into csv in Hadoop to load that hadoop file into the database.

    Any suggestions ?
    Last edited by princeus; 04-04-2014 at 01:58 PM.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    If you enable option "Rownumber in output" you will be able to tell different messages apart.
    Using "XML Input Stream", you will need a "Row Denormalizer" step to convert from columnar to tabular data.

    BTW: "Get Data From XML" will produce a tabular format immediately.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Mar 2014
    Posts
    16

    Default

    I tried with Get Data from Xml now.Since my xml struture is having the same elements in nested format I am not getting the expected tabular format still.

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    You're on your own as long as you don't provide sample data - what goes in, what's supposed to come out.
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Mar 2014
    Posts
    16

    Default

    Sure below are the structure of the document.
    P1->P2->P3->P4->P5.
    These data P1 to P5 are of complete data type and it will recurse in each level.
    The xml document can be retrieved in place of the above hierarchy. i.e, I can replace the P5
    1. XML document has a hierarchy.
    <root>
    <parents>
    <parent>
    <node> - this could be P1 or P2 or P3 or P4.
    <id>
    <status>
    <type>
    <node> - Depending on the high level node or hierarchy / parent of this node there will be either 1 or more nodes recursively.

    So basically I need to parse the xml from the root->parents->parent->node and all the way to the last node of the hierarchy.
    The final goal is to create a csv file with the attributes of the node elements as headers and each node as separate rows in that csv file.



    I can't provide the actual sample data since they are confidential.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.