Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Name Parsing

  1. #1
    Join Date
    Sep 2013

    Default Name Parsing

    Hello -

    I'm very familiar with tools that perform ETL/ELT, DI, DQ, DG, etc. but I'm just recently getting acclimated to PDI/Kettle. I'm trying to parse a first name field that includes middle name/initial. Here's an example of what the names look like (we're using additional identifiers to determine they are the same person):

    Fname Mname Lname
    Billie Joe Joe Smith
    Billie J. Smith
    B Joe Smith Davis

    I'd like to do the following:
    • Parse 'Mname' into a separate column (I'm already doing this using the "Split Fields" transformation)
    • Only keep records where the last names are different

    1. After the "Split Fields" transform/hop, which transform(s) would I use to only move the records where the last names are different (I'm using "MS Excel Input" and "MS Excel Output" transforms for testing)?
    2. For the last row in the column, how would I substitute 'B' for the first name of 'Billie'? Could I make two versions of the data set, transform one of them, then use additional identifiers to join/merge the transformed data set with the original (sort of like a self-join/subquery factoring using the "with" clause in Oracle SQL)?

    Attached is a screenshot of the workflow.

    Thanks in advance any guidance, and I look forward to learning more about PDI.
    Attached Images Attached Images  
    Last edited by bijeff09; 03-30-2014 at 10:51 PM. Reason: Figued out and issue in my original post

  2. #2
    Join Date
    Jun 2012


    It's always better to attach your zipped workflow, omitting output steps and replacing input steps by Data Grids or text files.
    That being said, your workflow looks capable to manage all the requirements from your bullet list and item 1 from the numbered list, too.
    So there only remains the name problem.
    As soon as you have some identity in place, you can consult an external reference for the correct name or you must apply a rule like "always pick the longest first name".
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Sep 2013


    Thanks for your response, marabu. Other than the records that contain a first initial (not a large percent of name anomalies), I think I have the workflow figured out.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.