Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Dimension lookup/update or Combination Lookup/update?

  1. #1

    Default Dimension lookup/update or Combination Lookup/update?

    This must be again a very naive question, but believe me, I made a deep research, though still see nothing clearly.

    I read this tens of times : http://wiki.pentaho.com/display/EAI/...+lookup-update, but still not sure what's the difference with his brother dimension lookup/update. Please let me know exactly where the difference.

    A brief description of my case:

    We're researching pages requesting to a server, so the requests could come from many sites (site_id) and pages/url(page_id, url) on these sites, and what they doing on our server (other fields,like clicks, time spent on server, etc).

    Remark: url and the page_id is the same, but it could reach to a very high number like million theoretically.

    Our final purpose is to know the behaviour on our server by each site's requests, and when clients clicking the site name, we're requested to list the site requests(url) in the specified time.

    So my second question is whether I could use the url as the dimension, where site_id is used as a hirachy level since this dimension could be so large.

    And I have the third one, is that since I've got a page_id in the source table, should I still need to create a technique key, which seems avoidless in the node dimension lookup/update and its brother node. If I could avoid it, can I use the page_id as the technique key, how to do that.

    It's a long post, but your replies should help me have a clear understanding on how to create dimension tables, though the wiki explains a lot.

  2. #2

    Default

    Need your helps on this.

    I'm building a data warehouse, but I'm blocked by this step.

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    They're different in the sense that one updates a slowly changing dimension and the other a junk dimension.

  4. #4
    DEinspanjer Guest

    Default

    Dimension lookup/update has more control over whether to update, punch through, or ignore certain fields. It also requires the dimension to be a slowly changing dimension with the extra fields version, from date, to date.

    Combination lookup/update is much more basic. It just looks for an existing combination of the supplied fields in the table. if they don't exist, a new entry is created for them.

    Both steps support caching and using a hashkey to speed up the comparisons.

    All this information is given in the documentation wiki.

  5. #5

    Default

    Thanks for your helps.
    It also requires the dimension to be a slowly changing dimension with the extra fields version, from date, to date.
    I cannot catch up this sentence like version, from and to date. Do you mean I need to specify a version field, from_date, to_date field in the original data source so that this node would interpret them as a different dimension levels.

    If you could list an example, which could be much helpful for those whose English is not a native language.

    Your replies are much helpful, and I think my case could use the basic node.

    Thanks again.

  6. #6

    Default

    Hello,

    Quote Originally Posted by xiawinter View Post
    Do you mean I need to specify a version field, from_date, to_date field in the original data source so that this node would interpret them as a different dimension levels.
    No, you need them in the target table.

    You will be reading the original data using an ID field like Customer ID. The customer data (Name, Address, etc.) may change but the ID will be the same in your transaction system.

    Kettle, using the Dimension lookup/update step, will create new records for changed customers (or update them in the target table if you want) and maintain the version, from_date and to_date fields for you. You can then e.g. join your fact tables using the source system customer ID and the from_date, to_date range to the current version of the customer record.

  7. #7

    Default

    Thanks very much.
    I think I fully understood version, from and. To fields, which are used to maintain versions.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.