    Loading from different source system to same target


    We have a requirement where we can read from csv files but from different servers and load the data in same table.
    One approach can be creating separate ETLs for each different source loading but that will be difficult to manage. Since the target system and the ETL logic is same for all the ETLs, we want some way to create single ETL and load data from source with some parameters ie if we can schedule ETLs to fetch data from servers (server parameters are to be provided by some parameter file or softcoding etc)

    Please let me know how it is possible...


    If your sources all have the same structure, you could simply all add them to your transformation. Then you can easily combine them to one stream with Append Streams.

    Is this what you mean?


    Thanks for your reply.

    I need to read from files available at different servers and load to some target table. Since files are generated daily, and have filename as file_<Date>, we we need to read from file generated on previous day only. There can be two approaches for ETL:
    a) Single ETL where we can pass the server, file name dynamically. This ETL will be executed with different parameters.
    b) As you said, single ETL containing all file sources and data appened. Is it possible to specify the file name at run time so that ETL pick the previous day file only?


    Nov 1999


    Please note you do NOT need "Append Streams". Any step will do.
    "Append Streams" specifically appends streams is a particular order.

    We want to load data parallely from multiple sources but in the same target table. If we create multiple transformations for each source, locking of target table would take place.

    If we create a single transformations with multiple inputs and single target as output, it will process sequentially for each source and load to target. Thus parallel loading will not take place.

    So how can we ensure parallelism.


    Nov 1999


    We want to load data parallely from multiple sources but in the same target table.
    Whether or not that is possible, highly depends on the database.
    Most databases flat out don't support it in the general case. That includes databases like Oracle, MySQL, PostgreSQL, etc, etc.

    If you think about it, writing to a file in parallel is also not possible. You need very advanced locking and caching algorithms to do it and even then it's still just writing in sequence.

    However, in PDI it's very simple to launch multiple copies of a step to write in parallel. So give it a try and see if or where your database topples over.


