Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Dimension Lookup/Update and Merge Rows best practices

  1. #1

    Default Dimension Lookup/Update and Merge Rows best practices

    Hello to Matt and to all:

    I want to sample a slowly changing dimension and create new versions of rows, only for those rows that have changed. Some columns could be unimportant timestamps, which are always different and should not generate a new version, but most columns are important data which should cause a new version to be created.

    Should the unimportant columns be marked "Update" under "Type of dimension update" and the important columns be marked "Insert"? Or do I need to use some sort of "Merge Rows" step to filter the identical rows?

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    The "dimension update" step typically handles both the insert as the update scenario.
    The trick typically is to limit the volume of rows you send to the step.

    If only 1% of the rows in your input system change, you can use the "Merge Rows" step to make the difference between 2 snapshots of a table. (or any other data source).
    Filtering out the unchanged rows saves you from looking up the other 99% of the rows since these are going to be discarded later anyway.

    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.