Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: About Data Integration

  1. #1
    Join Date
    Oct 2008
    Posts
    25

    Default About Data Integration

    Hello, I'm evaluating Data Integration for a project with Oracle and I have a couple of questions about it...

    1. Does Data Integration support OAS (Oracle Application Server) 10g?

    2. Origin database is very heavy... If an error happens when doing extract, what happens with the process? it restart from the beginning? it continues from error point?

    3. what kinds of auditory would be possible with Data Integration?

    4. Finally, anybody else has made something with Data Integration and Oracle? We have the problem that is not allowed to use database links with Oracle so we think in Kettle to transport data and make some transformations. But we have a lot of data so we're thinking in doing, previosly, a export to flat files and then import from Kettle in order to do ETL to target database. What do you think about?

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    IMHO
    Quote Originally Posted by joshid View Post
    1. Does Data Integration support OAS (Oracle Application Server) 10g?
    ??? OAS is not a database... you can get web pages from it probably with the http step, but for the rest ...

    Quote Originally Posted by joshid View Post
    2. Origin database is very heavy... If an error happens when doing extract, what happens with the process? it restart from the beginning? it continues from error point?
    Do it yourself, as is the case in most ETL tools. You have to do this yourself, but then you can also choose how you want to do it... from simple erase and restart to incremental loads.

    Quote Originally Posted by joshid View Post
    3. what kinds of auditory would be possible with Data Integration?
    You can log the transformations/jobs to a database table e.g.

    Quote Originally Posted by joshid View Post
    4. Finally, anybody else has made something with Data Integration and Oracle? We have the problem that is not allowed to use database links with Oracle so we think in Kettle to transport data and make some transformations. But we have a lot of data so we're thinking in doing, previosly, a export to flat files and then import from Kettle in order to do ETL to target database. What do you think about?
    sure why not... but be sure to test the export files especially regarding to escaping

    Regards,
    Sven

  3. #3
    Join Date
    Oct 2008
    Posts
    25

    Default

    thanks a lot for the more quickly answer I have in years...

    I definively take Pentaho (reporting, D.I.) for the BIG forum support.

    Best Regards

  4. #4
    Join Date
    Apr 2007
    Posts
    2,010

    Default

    Thats a good question about incremental loads though - Going forward it sounds like this is something it would be nice if Kettle would handle!

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    We already do. Typically you either work with dates, IDs or the Merge step to do incremental loads.
    • Dates: the source table contains a date of last changed (updated by the application or a trigger). In this case you can set up transformation logging to a logging table. Then you can grab the incremental date range (automatically calculated) from the "Get System Info" step.
    • IDs : this is easier : you just get the maximum ID from the target table (select max(id) as max) and grab "ID> max" from the source system. It obviously only works if the IDs always increment and rows are not updated.
    • Merge rows : it performs a compare between 2 input streams (table/file/...) to detect which rows have been changed, inserted, deleted

  6. #6
    Join Date
    Oct 2008
    Posts
    25

    Default

    I'd like to know if you're loading/extracting to/from a huge database and, suddenly, electric supply is cut off. Do you have to check every table to know wich is the point where load/extract is or Kettle does it automaticaly?

  7. #7
    Join Date
    May 2006
    Posts
    4,882

    Default

    It's not automagically... but you can build it in using some own DIY restarting mechanism of course (see Matt's response).

    And it's not automagically because there are a lot of different implementations out there on how to do restarting, so we can't build them all in. This is not different from other ETL tools, which have the same "problem".

    The usual is to get a marker from the main data before you start the ETL (id, date, ...) and store it in a separate table. Process the table starting from your marker upwards. When you restart you just repeat from scratch so all data already processed will be skipped, as they will be below the "new marker".

    Regards,
    Sven
    Last edited by sboden; 11-04-2008 at 05:26 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.