View Full Version : XSLT Transformation

04-23-2006, 04:46 AM
I need to transform one complex XML file into another using an XSLT Transformation. Could you tell me how I can do this on Kettle.

04-24-2006, 12:12 AM
I can't really tell you as I don't know what the XSLT is doing and I don't know a lot about XSLT Transformations anyway.

Kettle might or might not be useful for such a thing as "complex XML" can mean just about anything.

It seems that a lot of people think that by refering to the term XML they somehow think that this term makes everything clear. When in fact, XML just means that information is structured according to a certain standard. It doesn't say anything about the content or the strcuture itself. In essence, XML is just an interface, nothing more, nothing less.

All the best,


XML Quote: :-)
There was this fellow in Hell
Said, things are going to well
I'll put a stop
to work in that shop
They'll worship their new XML

08-12-2007, 03:25 PM
Sorry to revive this old thread;

Just wanted to know if something has been done regarding this issue. I have to do lots of transformations of xml, and an action that would allow a xslt transformation would be nice.

I have a lot of projects (EDI) where different formats of xml need to be parsed into db tables (and the opposite); Correct me if I'm wrong, but its still a bit difficult to do this kind of mapping (muitilevel xml<->rdbm) in kettle, am I correct? If not, where can I look for further info?


08-12-2007, 05:37 PM
I don't know if it will help you, but you can use the XSLT job entry in PDI.
It output a file from an xml and an xsl.
I am working in the equivalent step (produce a result stream).



08-12-2007, 09:38 PM
Ok, so let me see if I got it (I'm new to kettle)

I want to do 2 things: A -> read files of the tipe order-1234.xml from a dir and save them to a rdbm - lets say order_header and order_lines table and B2-> read invoices from that same rdbm, invoice_header and invoice_lines and generate invoice-1234.xml

What would be my approach in kettle? After reading some docs, looking at some examples, I would guess something like:

A -

1 -> define a transformation that reads order.xml and stores it into the rdbm
2 -> define a job that listens to the incoming dir for all orders.xml and delete the source file after successful transformation

I guess that's all I need...

for B -

1 - define a transformation that uses a query that returns all the info I need from the tables invoice_header and invoice_lines based on a invoce id
2 - write it to xml as is
3 - define a job that finds all unprocessed invoice ids
4 - run the transformation for all those ids
5 - for each output, make a xslt transformation

I dont know if there is any way for step 2 to output info on the form <header>....</header><lines><line>...</line></lines>, just to make xsl development easier, but I would guess some kind of grouping step. xml handling samples are really few.

Thanks a lot for the help, every tip appreciated

08-13-2007, 01:51 AM
For the second part you can look at one of the XML examples (under the sample directory) "XML Add - creating multi-level XML files.ktr" where multi-level XML is being created... but it's not for the faint-hearted and I would consider it very high-maintenance ;-)

ETL tools are better for row based stuff. Think of it this way... rows arrive at the XML step 1 by 1... it needs to decide what to do based on the current row (or possible a history of past rows), building multi-level XML with something like that is hard to do. None of the ETL tools I worked with have a good solution for it, all limit output possibilities to something as Kettle.


08-13-2007, 03:53 AM
I opened that example and after a quick look I closed it even faster :D

well, I just remembered that for the second part I can use a string concatenation of the 2 outputs and get something of the form:

Its very easy to get from this to a multilevel xml like the one from the example with xslt. I think it must be a cleaner approach.

The thing I don't quite get is how to guarantee that I will only process one invoice only one time... maybe by checking the files I've generated?

08-13-2007, 04:04 AM
The thing I don't quite get is how to guarantee that I will only process one invoice only one time... maybe by checking the files I've generated?

Don't know what you mean with this.

What other people also successfully got away with is using e4x in a javascript step to generate xml snippets. But that requires upgrading the js.jar in your installation and adding some other jars.