Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: dimension lookup/update inserts duplicate rows

  1. #1
    Join Date
    May 2013
    Posts
    8

    Default dimension lookup/update inserts duplicate rows

    Hi
    I'm struggling because it keeps inserting duplicates.
    All my strings are trimmed. All my numbers are ints. Looking at old postings I found this:
    But I don't understand the response. Rows out of order? Can somebody explain please?
    Thanks
    Tim




    • Dimension Lookup-Update : Duplicate rows

      I am currently using Dimension Lookup-Update step.
      I have the "Update the dimension" check box enabled.
      In the Fields tab,
      When the "Type of the dimension update" is "Insert"
      I think it is supposed to insert a new dimension record version only if One or more attributes were different.
      But, in my case,
      eventhough I have the same set of data, a new row is being inserted.

      Am I missing something? Correct me if my understanding is wrong.

      Thanks,
      Om
      ----------------------
      Spoon version -4.3.0
      DB2 v9.5.0.808
      Windows XP


    • 08-10-2010, 01:43 PM
      MattCasters
      Chief Data Integration


      Join DateNov 1999Posts9,535




      Well, make sure to never offer rows of data out of order. That is NOT supported.



  2. #2
    Join Date
    May 2013
    Posts
    8

    Default

    I've created a small test example which does the same thing.
    I'm using postgres 9.2

    The table is defined like so

    DROP SEQUENCE test.cite_id_seq cascade;


    CREATE SEQUENCE test.cite_id_seq
    INCREMENT 1
    MINVALUE 0
    MAXVALUE 9223372036854775807
    START 0
    CACHE 1;
    ALTER TABLE test.cite_id_seq
    OWNER TO postgres;


    drop table test.cite;


    CREATE TABLE test.cite
    (
    id int NOT NULL default nextval('test.cite_id_seq'),
    client character varying(20) NOT NULL,
    citecode character varying(40) NOT NULL,
    x integer ,
    y integer ,
    version integer,
    date_from timestamp with time zone,
    date_to timestamp with time zone,
    dummydate timestamp with time zone,
    CONSTRAINT pk_cite PRIMARY KEY (id)
    )
    WITH (
    OIDS=FALSE
    );
    ALTER TABLE test.cite
    OWNER TO postgres;


    Now because the field client is part of the "business key" and therefor set to non null I initialise with this

    -- this is used to initialise the dummy entry in the database
    insert into test.cite (client, citecode, x, y) values ('dummy', 'dummy', 0, 0);

    this is the contents of the lookup step (full file attached) from the ktr file

    <step>
    <name>Dimension lookup/update</name>
    <type>DimensionLookup</type>
    <description/>
    <distribute>Y</distribute>
    <copies>1</copies>
    <partitioning>
    <method>none</method>
    <schema_name/>
    </partitioning>
    <schema>test</schema>
    <table>cite</table>
    <connection>testdb</connection>
    <commit>100</commit>
    <update>Y</update>
    <fields>
    <key>
    <name>client</name>
    <lookup>client</lookup>
    </key>
    <key>
    <name>citecode</name>
    <lookup>citecode</lookup>
    </key>
    <date>
    <name>dummydate</name>
    <from>date_from</from>
    <to>date_to</to>
    </date>
    <field>
    <name>x</name>
    <lookup>x</lookup>
    <update>Insert</update>
    </field>
    <field>
    <name>y</name>
    <lookup>y</lookup>
    <update>Insert</update>
    </field>
    <field>
    <name>dummydate</name>
    <lookup>dummydate</lookup>
    <update>Insert</update>
    </field>
    <return>
    <name>id</name>
    <rename/>
    <creation_method>sequence</creation_method>
    <use_autoinc>N</use_autoinc>
    <version>version</version>
    </return>
    </fields>
    <sequence>cite_id_seq</sequence>
    <min_year>1900</min_year>
    <max_year>2199</max_year>
    <cache_size>5000</cache_size>
    <preload_cache>N</preload_cache>
    <use_start_date_alternative>N</use_start_date_alternative>
    <start_date_alternative>none</start_date_alternative>
    <start_date_field_name/>
    <cluster_schema/>
    <remotesteps> <input> </input> <output> </output> </remotesteps> <GUI>
    <xloc>541</xloc>
    <yloc>160</yloc>
    <draw>Y</draw>
    </GUI>
    </step>
    Attached Files Attached Files

  3. #3
    Join Date
    May 2013
    Posts
    8

    Default

    just realised my "test" case is generating random business keys for 2nd iteration. Doh!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.