Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: Creating XML “Metadata”

  1. #1
    Join Date
    Oct 2007
    Posts
    16

    Default Creating XML “Metadata”

    Let’s explain my question with an example.

    Let’s say that we have the following XML file:

    <?xml version="1.0"?>
    <STUDENT>
    <PERSONAL_INFORMATION>
    <ID>100001</ID>
    <NAME>Name of Student 1</NAME>
    <SURNAME>SurName of Student 1</SURNAME>
    <MAIDENNAME>Maiden Name of Student 1</MAIDENNAME>
    <DOB>19.12.1983</DOB>
    </PERSONAL_INFORMATION>

    <ADDRESSES>
    <ADDRESS>
    <TYPE>Permanent</TYPE>
    <LINE1>Address Line 1</LINE1>
    <LINE2>Address Line 2</LINE2>
    <LINE3>Address Line 3</LINE3>
    <CITY>Liverpool</CITY>
    <COUNTRY>UK</COUNTRY>
    </ADDRESS>
    <ADDRESS>
    <TYPE>Mailing</TYPE>
    <LINE1>Address Line 1</LINE1>
    <LINE2>Address Line 2</LINE2>
    <CITY>London</CITY>
    <COUNTRY>UK</COUNTRY>
    </ADDRESS>
    </ADDRESSES>

    <GRADEBOOK>
    <MODULE>
    <ID>SE-13</ID>
    <STARTDATE>12.09.2004</STARTDATE>
    <ENDDATE>12.11.2004</ENDDATE>
    <GRADE>A</GRADE>
    </MODULE>
    <MODULE>
    <ID>CS-21</ID>
    <STARTDATE>12.09.2005</STARTDATE>
    <ENDDATE>12.11.2005</ENDDATE>
    <GRADE>C</GRADE>
    </MODULE>
    <MODULE>
    <ID>PM-14</ID>
    <STARTDATE>12.09.2007</STARTDATE>
    </MODULE>
    </GRADEBOOK>
    </STUDENT>


    The fields that I get from this XML file are:

    Student1
    Student1Personal_information1
    Student1Personal_information1Id1
    Student1Personal_information1Name1
    Student1Personal_information1Surname1
    Student1Personal_information1Maidenname1
    Student1Personal_information1Dob1
    Student1Addresses1
    Student1Addresses1Address1
    Student1Addresses1Address1Type1
    Student1Addresses1Address1Line11

    Etc…

    What I would like to do is get this information from the XML file and store it in a database table or in other words I am interested in the metadata of the file, not the data. Can I do this with Spoon?



    Thanks

    Magi

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    No.... but why would you want to do that ... the meta-data is going to be the same everytime?

    Regards,
    Sven

  3. #3
    Join Date
    Oct 2007
    Posts
    16

    Default

    Not really.
    I have hundreds of thousands of files that I need to load. The metadata is similar, but not the same. Let’s say I have created transformation and used it to load 100000 files. Let’s say that in that transformation the max number of grades was 10, and I only mapped 10 of the fields to go to the grades table. What if in the next 100000 files I have a student with 11 grades and want to use the same transformation to load all the students. I need a mechanism to check whether my transformation needs modifications

    Thanks

    Magi

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    I would consider that to be a nightmare.... But no, the metadata can't be exported right now as meta meta-data.

    Regards,
    Sven

  5. #5
    Join Date
    Oct 2007
    Posts
    16

    Default

    yes it is a nightmare. Ok, another try. Let's put it this way... What if I had a DTD and all the XML files conformed to the DTD. Can I combine DTD information to create a generic transformation?

    thanks

    Magi

  6. #6
    Join Date
    May 2006
    Posts
    4,882

    Default

    euh ... no. The "problem" is in variable processing, how do you know what do with an element if no-one told Kettle what to do with it at design time.

    Even if you could extract variable data from XML (which I'm sure would be possible) you would be stuck afterwards anyway. The rest of the steps also don't support variable parts.

    Regards,
    Sven

  7. #7
    Join Date
    Oct 2007
    Posts
    16

    Default

    So, this means that the solution to my problem would be very manual.

    1. Copy all the 100000+ files in one folder (because they are all saved in their own folders and spoon only reads files in the root and not in the subfolders)

    2. pass the directory name to an XML input and manually map to the outputs for every fileld found

    3. and then repeat this for every new batch of files ....


    Thanks

    Magi

  8. #8
    Join Date
    May 2006
    Posts
    4,882

    Default

    Manually no, but it's not possible with Kettle right now.

    I would write something outside of Kettle, it is possible to process these kind of files automatically, but not with Kettle because of the variable parts in your input data.

    Regards,
    Sven

  9. #9
    Join Date
    Oct 2007
    Posts
    16

    Default

    So, you do not recommend processing this kind of data with kettle?

    I am still not giving up though. I really need an ETL tool and I already wrote something with java and stored procedures, but driving it myself was too manual for me.

    I got my filenames with javascript and I am now trying to pass them to an XML Input, so I have set the filename to ${FILENAME}, the root element and I have manually entered the fields that I know exist in all files with positions and all. Do you think that this could work?

  10. #10
    Join Date
    May 2006
    Posts
    4,882

    Default

    Quote Originally Posted by magi View Post
    I got my filenames with javascript and I am now trying to pass them to an XML Input, so I have set the filename to ${FILENAME}, the root element and I have manually entered the fields that I know exist in all files with positions and all. Do you think that this could work?
    Sure, if you only pick the non-variable parts. And be sure to set the variable in another transformation than the one you're using it in.

    By the way I did your kind of XML processing in the past and I always reverted back to something in Perl or Java/e4x for pre-processing. If you can get your XML data into CSV format you're good. If you can't you will have problems with the ETL tool anyway.

    Regards,
    Sven
    Last edited by sboden; 11-02-2007 at 09:32 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.