Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Database lookup and SCD issues

  1. #1
    Join Date
    Oct 2011
    Posts
    4

    Default Database lookup and SCD issues

    Hi all,

    I am trying to get data into a SCD. After setting up the data flow I recognized that there are NULL-values in the first row of my SCD. Has anybody an idea to fix that? That would be nice.

    Additional I have a foreign key in some dimensions which I am looking up with a database lookup node, due to that reason described above I can't insert new records, because the first row has a NULL-value.

    Are there any solutions. Thanks in advance!

    Greetz

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi.
    You can use the step:
    http://wiki.pentaho.com/display/EAI/...+value+is+null
    and replace null with another value.

    Or you can use a Filter step to remove those rows altogether.

    Mick

  3. #3
    Join Date
    Oct 2011
    Posts
    4

    Default

    Hi Mick,

    thanks for your reply but it doesnt solve my problem. here is a szenario:

    E.g. I have a dimension which has a foreign key and a surrogate key, that dimension is filled via a SCD Insert /Update node. The first row of this dimension has just null-values! With an fk-constraint it wont work. For sure I can remove such constraints but it makes no sense to me. Why Pentaho inserting a row with null-values?
    I dont have null values in my staging Area?

    Thanks in advance..

    Greetz

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    You need it to ensure a 1:N relationship between a fact table and the slowly changing dimension. Without the "unknown row" you can not achieve this.
    PDI inserts the row because you forgot to do so.

    There are other conventions where you would for example use -1 for "Natural key not found", -2 for "Invalid natural key" but for now we simply use 0 (or 1) if the lookup fails for any reason..

  5. #5
    Join Date
    Oct 2011
    Posts
    4

    Default

    Hi Matt,

    thanks for reply, I ensure referency integrity and also use 1:N relationships between my dims and fact tables. The picture below illustrates my issue.
    We have different clients in our system. These clients may use same customer id's, thats the reason why I have to use FK's as well as constraints in one of the dimensions.
    Name:  PentahoIssue_w_FKs.jpg
Views: 45
Size:  19.5 KB

    Do you have another idea? Everything works fine, but I am wondering about the null-values which were generated by SCD Insert/Update - Node.

    Thanks in advance.

    Greetz

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    You can put any kind of value in place of the null values, but there is no way out of the unknown record if you want to have a slowly changing dimension.
    In your case you would run into trouble when you have an unknown hospital, customer or client while populating the fact table.
    Usually the unknowns happen because of data quality problems. That is why you need to put an unknown record into a SCD.

    Matt

    P.S. There is no need to put foreign keys on the fact table if the ETL process runs correctly. It generally slows loading down for large data volumes. As such you typically don't do it or only during development.
    P.P.S. Joining from the hospital dimension to the client dimension is somewhat suspicious. Are you sure the model is OK?

  7. #7
    Join Date
    Oct 2011
    Posts
    4

    Default

    Hi Matt,

    thank you very much. I am sure that the model is ok. Imagine, different clients possibly using the same natural keys but different descriptions of there floors oder rooms or something like that. Thats the reason why I am using the fk in hospital dimension. Additionaly clients could also use same PatIds but indentify differend patients. It very complex... the picture was just an example..

    You know what I mean? So thank you very much I will try to increase our data quality..

    Thank you..

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    In my experience relationships between 2 dimensions are always best expressed through a fact table. If you try do it differently you'll get into trouble sooner or later.

    In any case, it looks like you're enjoying yourself. Try not to have too much fun!

    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.