Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Newbie - ETL Design opinion

  1. #1
    Join Date
    Jun 2009
    Posts
    23

    Default Newbie - ETL Design opinion

    I am trying to load a fact table data from two local staging tables.
    I want to populate the dimension key with a lookup based on the data in the staging tables.

    Is it better to perform these lookups on the table input side in the sql query when extracting the data or or is it better to perform each db lookup individually as part of the etl stream?

    (ie) I'm looking at large volumes - say 100M in staging

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    A relational database is usually designed to do joins so usually it's better to join inside of the database. Same thing for sorting rows.

    Doing CPU intensive tasks, parallel work, scripts, in-memory lookups, etc is what an ETL tool like PDI is good for.

    With that in mind, you should find the balance based on the actual task at hand.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.