US and Worldwide: +1 (866) 660-7555
Results 1 to 3 of 3

Thread: Bypassing "Bad" Input Rows

  1. #1
    Join Date
    Sep 2007
    Posts
    10

    Unhappy Bypassing "Bad" Input Rows

    I just got a badly formatted CSV type file to process and it causes my xForm to stop with a parse-type error. I cannot find a way to "trap" those bad rows, write the to an exception file, and then proceed to allow all the good data through.

    None of the CSV, Fixed or Text File Steps have external Error Handling dialogs associated with them and I cannot work with the old Text File Input step since it does not place the error data fields into the stream where I can test them. (v3.2 GA).

    Is there any workaround?

  2. #2
    Join Date
    Apr 2008
    Posts
    1,784

    Default

    I have heard of some people using a string based staging table to to it.

    Take the CSV in all as strings, and add a status of "I" to them.
    Write all of this to a table.

    For each record in the staging table, attempt data verification on them.
    If good, change status to "G"
    If failed validation, set status to "B"

    For every row marked "G", proceed with transform

    Run a report of rows "B" and send to operator to repair.

  3. #3
    Join Date
    Jun 2007
    Posts
    213

    Talking Here is an earlier post on the topic with solution

    http://forums.pentaho.org/showthread...052#post232052

    This is what I do to solve this exact issue. The attached transformations and jobs are the 'template' of the methodology. It works well, runs clean, and allows full error control. I take the erroneous rows back to the source and ask for an 'explanation please' . This method runs pretty quickly too, and has proven pretty robust over the last two years. I hope it helps you.

    There was one thing that needed to change in the attached transformations, and that was the name of the column 'row'. It turns out that this is a reserved word in PDI and you would be well advised to alter this to something like 'rownum' instead. It is most helpful if you want to work with Javascript steps and the like to not use that reserved word as as fieldname.

    Maybe I should post a WIKI on this?

    Cheers

    The Frog
    Everything should be made as simple as possible, but not simpler - Albert Einstein

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •