Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Export to XML; massive data duplication (eg. connections)

  1. #1
    Join Date
    Feb 2012
    Posts
    1

    Default Export to XML; massive data duplication (eg. connections)

    Hello,

    If I export my entire database-based repository via Spoon (or Pan), it will create a nice big XML file that I can then import into a new, empty file-based repository. This is ultimately a successful process, so that's good; however, it's not quite perfect. Unfortunately, I've noticed that the resulting exported XML file contains enormous amounts of duplicated objects - specifically, the connection objects are listed over and over in each imported Job and Transformation definitions (in my case, thousands of times).

    Is there any way to prevent this from happening ? For example, is there a way to indicate that a given connection object should only be exported once to the root of the repository (in the form of a .kdb perhaps) ? That's just one idea of course - I'd be happy to hear about any other realistic solutions.

    Thank you.

    --
    dan.

  2. #2

    Default

    Quote Originally Posted by eco2dan View Post
    Hello,

    If I export my entire database-based repository via Spoon (or Pan), it will create a nice big XML file that I can then import into a new, empty file-based repository. This is ultimately a successful process, so that's good; however, it's not quite perfect. Unfortunately, I've noticed that the resulting exported XML file contains enormous amounts of duplicated objects - specifically, the connection objects are listed over and over in each imported Job and Transformation definitions (in my case, thousands of times).

    Is there any way to prevent this from happening ? For example, is there a way to indicate that a given connection object should only be exported once to the root of the repository (in the form of a .kdb perhaps) ? That's just one idea of course - I'd be happy to hear about any other realistic solutions.

    Thank you.

    --
    dan.
    Hi eco2dan,
    You wrote this post in the 2012... 3 years are gone, but the song remains the same... When you export your repo from a db based one (it's an example) and then you import the xml output file in a file system based repository, you obtain 1 ktr file for every transformation and 1 kjb file for every job... but you don't obtain a kdb file for every connection you have defined in the source repository. Viceversa you can find EVERY connection you used in ANY transformation written inside EVERY ktr file...

    Now, I'm trying to understand the logic that PDI follows creating the kdb files when you define a connection, but I'm feeling that PDI creates these files only when you create the connection from the repository explore tab...

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.