Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Using Dimension lookup/update step for Data Vault Satellite insert

  1. #1

    Default Using Dimension lookup/update step for Data Vault Satellite insert

    In [PDI-329|http://jira.pentaho.com/browse/PDI-3209] someone asked to provide an example how to use the Dimension Lookup/update step for inserting data into a satellite.

    I'm sure it is not the performantest way to insert data into a satellite. So how did I use this step - here an example how to fill an address satellite (sorry, german field names):
    * the transformation has access to a variable ${load_to} which contains the timestamp of the load process. This variable is global to the ETL. ${load_to} is used as stream field 'load_dts'.
    * run the Dimension Lookup/Update step in "Update the dimension" mode
    * the satellite has a technical key called 'id'. A sequence 'postadresse_id_seq' is used to generate values for the technical key.
    * the hub id (postadresse_id) is used as lookup value
    * we added a 'version' field to the satellite because it is needed by the Dimension Lookup/update step
    * the stream field 'load_dts' is used as 'Stream Datefield', 'Date range start field' and as 'Use altenative start date' --> the first entry in the satellite will have the current 'load_dts' as 'load_dts' (not a value in the past like 1900-01-01)
    * use 'load_end_dts'

    Advantage using Dimensional Lookup/update step:
    * reducing steps/complexity for a satellite

    Disadvantage:

    * Dimension Lookup/Update step inserts a 0 entry in the satellite (all NULL values). Which makes sense for a dimension, but not for a satellite --> NOT NULL constraints not possible
    * It is not possible to create an unique index on 'hub id' and 'load_end_dts' because the insert of the new value is done in the following order
    ** insert new value (so we have 2 rows with the same 'hub id' and 'load_end_dts' = NULL)
    ** update load_end_dts of previous version
    * 'version' field is just here because the Dim. Lookup/Step needs it.

    To be honest i haven't looked into the source code of the Dim. Lookup/Update step. We probably have to tweak the index based on how the step works.


    kind regards
    Max

    hmm - somehow i can't find a way to upload an attachment .. any hints how to attach a file?

  2. #2
    Join Date
    Nov 2008
    Posts
    143

    Default

    Quote Originally Posted by maxh View Post
    hmm - somehow i can't find a way to upload an attachment .. any hints how to attach a file?
    "Go Advanced"

  3. #3

    Default Example file attached

    OK found it. Here the example.

    Be aware that it is not a good idea to use multiple copies of the Dimension Lookup/update step if you are not sure about uniqueness of incoming stream data.

    In the example we are sure the incoming rows are unique --> we can use multiple copies.

    kind regards
    Max
    Attached Files Attached Files

  4. #4
    Join Date
    Oct 2006
    Posts
    7

    Default Using Combined and Dimension lookup/update for DV loading

    Hi,

    Thanks for the example. After that I did some home work to

    I have used the example that Edwin posted in http://forums.pentaho.com/showthread...-Jira-PDI-3209 .

    The load_sakila_dv_hub_city transformation uses the 'combined lookup/update' for loading the hub. In the 'combined lookup/update' I can't add an audit field for the 'record_source_id'. If this would be added to the step it is a good step to load hubs. In the example I added a blocking + update step to fill the 'record_source_id' afterwards.

    The load_sakila_dv_sat_city is for the first part identical to the hub transformation. We have to lookup the hub id's and if not found we've to add them. After this You can see that it loads to two satellites with the 'dimension lookup/update' step. The sat_city satellite is from the demo dv set. The sat_city1 satellite is the one I created. The sql is attached to create it, and load a default record in advance.

    The 'dimension lookup/update' loading the sat_city uses the same setup as Max uses. The only thing I've had to add to the satellite is a version field.

    The 'dimension lookup/update' loading the sat_city1 uses a little different setup. I've removed the technical sequence key as we don't use in DV land. By setting the 'dimension lookup/update' to use autoincrement in the DB and leaving the technical field blank this works (is this a bug?)

    My conclusion:
    1 - add an optional audit field in the 'combined lookup/update' and we can use it for hubs and links
    2 - make the technical key and version optional in the 'dimension lookup/update'
    3 - add an option to load fields in the dimension/sat that are not compared e.g. 'no history'. This is a more generic case of an audit field.

    So we can add these options to the existing steps or we derive 2 new steps based on these existing steps.

    Regards,
    JJ.


    load_sakila_dv_sat_city.ktr
    load_sakila_dv_hub_city.ktr
    sat_city1.sql.txt
    Last edited by delostilos; 02-15-2011 at 08:32 AM.

  5. #5

    Default

    If i may extedn your list of needed changes:
    4. make insertion of 0 id (with all NULL values) optional
    5. it should be possible to use NULL for load end DTS

    I have a feeling the list is going to grow. Which implies (IMHO) that we should go for a separate step. Otherweise the Dimension Lokup/update step is going to be too complicate to use.

  6. #6
    Join Date
    Oct 2006
    Posts
    7

    Default

    Hi,

    Yes, I think we should work towards a satellite insert step, that can handle multiple changes per business key. I've experimented with building a 'hub/link insert/update' plugin based on the 'combination lookup/update'.
    Now I'm planning to make a 'combination lookup/update (audit/plus)' step plugin, so we can use this for hub/link managing. Then I'll look into the 'Dimension lookup/update' step, to see if I can use it as a starting point for a 'satellite update' step plugin.

    Regards,
    JJ.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.