Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Get dublicates

  1. #1
    Join Date
    Nov 2007

    Default Get dublicates


    I want to use Kettle to check XLS-Files edited by humans for errors. The plan is to configure several checks and add a message when a task fails to the corresponding row in the data stream. The idea is to send another excel file back to the author of the excel file and this excel file contains a new column indicating if a row is correct or not or what's the problem is (e.g. "must be a number", "is not unique", "is not a valid value").

    I decided to use the Javascript-feature for most of the checks. I am struggling with the requirement to identify if all values in a column are unique. The "unique rows"-step deletes the non-unique-rows from the data stream, I want them marked as non-unique. Does this require a plugin doing some sql-queries or is there any way within the standard functionality?

    Thanx for your help!

  2. #2
    Join Date
    Nov 1999


    If you use the option "Add counter to output" in the "Unique rows" step, you'll get the number of occurrences.
    If you filter on that figure > 1 you have all the rows that had duplicates.

    All the best,


  3. #3
    Join Date
    Nov 2007


    Matt, first of all thank you for the hint. :-)

    Just to make sure that I got it right: I use the "Filter row" step to filter the data, right?

    I connected the "unique-rows"-step with the "filter rows"-step. Afterwards I can't run this transformation anymore, error message is:

    Transformation was unable to open [ImportListOfReports]
    An error occured reading a transformation from the repository
    Unexpected error reading step information from the repository
    Error loading condition from the repository (id_condition=2)
    Unable to load Value from repository with id_value=1
    Unexpected conversion error while converting value [constant String] to an Integer
    constant String : couldn't convert String to Integer
    Unparseable number: "1"

    When I closed spoon and opened it again spoon is unable to read this transformation any more. This problem is reproducable. I am using Spoon 3.0.0 Build 500, the repository is stored in a mysql5.0 database.

    Where is my mistake?


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.