I'm trying to process an xml data file. A snippet is below:

<master_information>
<product_xref>
<xref type="SKU" type_id="1">148JC8</xref>
<xref type="LOCID" type_id="2">USWASE</xref>
</product_xref>
</master_information>

When I parse with the StAX "XML Input Stream" step, I get the following:

xml_data_type_description xml_path xml_parent_path xml_data_name xml_data_value
1. START_ELEMENT /master_information/product_xref /product/master_information product_xref <null>
2. START_ELEMENT /master_information/product_xref/xref /product/master_information/product_xref xref <null>
3. ATTRIBUTE /master_information/product_xref/xref /product/master_information/product_xref type SKU
4. ATTRIBUTE /master_information/product_xref/xref /product/master_information/product_xref type_id 1
5. CHARACTERS /master_information/product_xref/xref /product/master_information/product_xref xref 148JC8
6. END_ELEMENT /master_information/product_xref/xref /product/master_information/product_xref xref <null>
7. START_ELEMENT /master_information/product_xref/xref /product/master_information/product_xref xref <null>
8. ATTRIBUTE /master_information/product_xref/xref /product/master_information/product_xref type LOCID
9. ATTRIBUTE /master_information/product_xref/xref /product/master_information/product_xref type_id 2
10. CHARACTERS /master_information/product_xref/xref /product/master_information/product_xref xref USWASE
11. END_ELEMENT /master_information/product_xref/xref /product/master_information/product_xref xref <null>
12. END_ELEMENT /master_information/product_xref /product/master_information product_xref <null>

(row numbers added for clarity)

So far so good. But here's where I start having problems. Each "xref" row in the xml corresponds to a START_ELEMENT / END_ELEMENT block in the parsed data. For e.g. rows 2 through 6 is the first row, 7 through 11 is the second row. (there can be more rows, but you get the idea)

Each xref row also corresponds to a row in my database table. So, I need to process the data to get following result set:

SKU 1 148JC8
LOCID 2 USWASE

How can I do this in pdi? I guess I could write a java class, but is there a more direct way that I'm missing?

The solution needs to be as flexible as possible, including being able to handle complex, nested START_ELEMENT / END_ELEMENT blocks.




Thanks in advance.