Hitachi Vantara Pentaho Community Forums
Results 1 to 11 of 11

Thread: Dimension Lookup/update to always return the technical Key!

  1. #1
    Join Date
    Jul 2010
    Posts
    25

    Default Dimension Lookup/update to always return the technical Key!

    Hello,

    I would like to know please if it is possible to configure the Dimension Look up / Update step to always return the technical key regardless of whether it did an update, insert or anything else. Is this possible?

    Please let me know. I don't get to work with Kettle often but I really would like to get the step to behave as above. I had a previous post before about this http://forums.pentaho.com/showthread.php?77865-Dimension-Lookup-update-returns-NULL-technical-Key!&highlight= but I never got a final answer.

    Regards,
    Rasheedik

  2. #2
    Join Date
    Sep 2009
    Posts
    810

    Default

    Hi there,
    The dimension step "does" return the technical key and puts it to the stream. Always. There is usually no need to look up the current record again. Why would you try to look it up in your context?
    Looking at the previous post I believe the problem might be that you are looking up the dimension record after having created/updated it. The dimension step does not commit single rows immediately (commit size). Therefore you cannot expect an entry to be available on the db just after passing the dimension update step. The lookup might fail, thus returning a null entry for the key.

    I guess I may be missing something. Maybe you'd like to clarify again where exactly the problem is?

    Cheers

    Slawo

  3. #3
    Join Date
    Jul 2010
    Posts
    25

    Default

    Hi Slawo,

    Thank you very much for your reply.

    Ok...I guess my problem then is as you stated...I am expecting the technical key inserted/updated by the SCD step to be immediately available in the next step. I would like to use the technical key of the dimension to insert into a fact table, for example, and I am getting null in the fact table sometimes. In another example, I have a parent-child relationship in the dimension and I would like the parent key generated in one step to be used in the next step as a parent of the child.

    What would be the best way to use the SCD step for the above usage?

    Thank you very much again for your help.

    Regards,
    Khaled

  4. #4
    Join Date
    Sep 2009
    Posts
    810

    Default

    Hi Khaled,

    well, you *can* use the key, as it is returned to you immediately. The Dimension step puts it into the row stream for you, even though the record with this key does not exist in the db (yet). You must specify the technical key field in the Dimension step. Right click the Dimension step and choose "Show output fields". The key field should be there. Click preview and you should see that every record comes with a proper id. Just use this id and put it into your fact table or use it as a parent id. After the transformation completes, all the records will be there

    Cheers

    Slawo

  5. #5
    Join Date
    Jul 2010
    Posts
    25

    Default

    Hi Slawo,

    Thank you for your quick response.

    Sorry for being slow...but I still do not know what I am doing wrong. Obviously I did expect the above behavior and I don't really need it in the database if the proper key is inserted into the stream so I can use it. But this is not happening with me....I am getting null or an existing key sometimes which seems to indicate that I am doing something wrong. Right clicking is helpful (thanks for the tip) but this does not help me in real time as I am transferring many records.

    Also...how do other people use this step? I can't believe that I am doing something unique. I mean you want to load dimensions and then you want to load facts...right? BTW...the Combination Lookup/Update works as above....I always get the technical key and things work well. The problem that I face is only when I use this step for SCD dimensions.

    I read in other posts that perhaps the way to use this is to load up the dimensions in one transformation, copy rows to another transformation and load facts. Is this a good approach?

    Thank you very much,
    Khaled

  6. #6
    Join Date
    Sep 2009
    Posts
    810

    Default

    Hi Khaled,

    oh, ok Good to hear that the Combination Lookup/update step works for you. Now, may I inquire as to how you generate the key for your SCD records? max+1, sequence or auto increment field? It is usually safest to use a sequence or auto increment field (pretty much eliminates accidental concurrency anomalies). Maybe you'd like to vary the key generation method and see if things improve?

    Cheers

    Slawo

    PS: which version of Kettle are you using? The Dimension lookup/update step has been continuously worked on and a lof of bugfixes were applied over time...
    Last edited by slawomir.chodnicki; 08-25-2010 at 03:44 PM.

  7. #7
    Join Date
    Jul 2010
    Posts
    25

    Default

    Hi Slawo,

    Thanks again.

    Yes...the Combination Lookup / update works well for me. It is only the Dimension Lookup / Update step that I am having problems with. Unfortunately this is the feature that attracted me to Kettle as I have SCD dimensions in my DW and this seems to be exactly what the doctor ordered (if it works).

    Having said that, I am using SQL Server 2008 as my DW database because i would like to use SSAS which seems to work well for me. I am using Pentaho PDI 4.0 and started using it about a month ago. I really like everything about it but this issue has been quite confusing and a source of headache for me as I am unable to determine how it works. Since I am using SQL Server 2008, the two available options are: max + 1 or Auto. I have been using auto ...it seems more natural in SQL Server. The sequence is not available unfortunately against the SQL Server database.

    Thanks,
    Khaled

  8. #8
    Join Date
    Jul 2010
    Posts
    25

    Default

    Hi Slawo,

    I did configure the Dimension step to use max + 1 and it does work better for me. At least I am not seeing NULL entries in my fact table any more. So thank you very much...I hope this is the answer.

    However I would like to ask you please if the way I am using this step is the correct way. Should I be considering load the dimensions, copy rows and load the fact? Is this the preferred way of doing things with this step? How do other people use it?

    Thank you for all your responses.

    Regards,
    Khaled

  9. #9
    Join Date
    Sep 2009
    Posts
    810

    Default

    Hi Khaled,

    I see. Would you try using the max+1 generation method? My suspicion is that the auto increment field support is somehow flawed in 4.0 for MS SQL. An auto-generated field should never yield a null value. If it does, Kettle should complain. Would you like to create an issue against Kettle 4.0 in JIRA?

    Maybe you'd like to try the latest build to see whether the issue got fixed already? This post explains where to get it:
    http://forums.pentaho.com/showthread...napshot-Builds

    Kind Regards

    Slawo

  10. #10
    Join Date
    Sep 2009
    Posts
    810

    Default

    Hi Khaled,

    The max+1 method is an option if you have no concurrent inserts on the table (from other processes for example). I hope you can use it as a workaround for the time being.

    and no worries, you are using the step correctly

    Cheers

    Slawo

  11. #11
    Join Date
    Jul 2010
    Posts
    25

    Default

    Hi Slawo,

    Thank you or everything. It has been great getting your answers (and knowledge) quickly.

    I will try the latest and see if the issue has been fixed.

    Regards,
    Khaled

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.