Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Loading from different source system to same target

  1. #1

    Default Loading from different source system to same target

    Hi

    We have a requirement where we can read from csv files but from different servers and load the data in same table.
    One approach can be creating separate ETLs for each different source loading but that will be difficult to manage. Since the target system and the ETL logic is same for all the ETLs, we want some way to create single ETL and load data from source with some parameters ie if we can schedule ETLs to fetch data from servers (server parameters are to be provided by some parameter file or softcoding etc)

    Please let me know how it is possible...

    Regards

  2. #2

    Default

    If your sources all have the same structure, you could simply all add them to your transformation. Then you can easily combine them to one stream with Append Streams.

    Is this what you mean?

    Gr.

    Rick
    Last edited by rickonline; 07-20-2009 at 06:09 AM. Reason: Typo

  3. #3

    Default

    Hi,

    Thanks for your reply.

    I need to read from files available at different servers and load to some target table. Since files are generated daily, and have filename as file_<Date>, we we need to read from file generated on previous day only. There can be two approaches for ETL:
    a) Single ETL where we can pass the server, file name dynamically. This ETL will be executed with different parameters.
    b) As you said, single ETL containing all file sources and data appened. Is it possible to specify the file name at run time so that ETL pick the previous day file only?


    Regards

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Please note you do NOT need "Append Streams". Any step will do.
    "Append Streams" specifically appends streams is a particular order.

  5. #5

    Default

    Hi

    We want to load data parallely from multiple sources but in the same target table. If we create multiple transformations for each source, locking of target table would take place.

    If we create a single transformations with multiple inputs and single target as output, it will process sequentially for each source and load to target. Thus parallel loading will not take place.

    So how can we ensure parallelism.


    Regards

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    We want to load data parallely from multiple sources but in the same target table.
    Whether or not that is possible, highly depends on the database.
    Most databases flat out don't support it in the general case. That includes databases like Oracle, MySQL, PostgreSQL, etc, etc.

    If you think about it, writing to a file in parallel is also not possible. You need very advanced locking and caching algorithms to do it and even then it's still just writing in sequence.

    However, in PDI it's very simple to launch multiple copies of a step to write in parallel. So give it a try and see if or where your database topples over.

    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.