Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: Remove control-M (in some columns in some rows)

  1. #1
    Join Date
    Feb 2007
    Posts
    27

    Default Remove control-M (in some columns in some rows)

    With PDI, I get millions of records from several Informix Databases to txt files, as intermediate step to load them into Oracle Database.
    The big problem: in one string column, there are some "control-M" characters (only 10/100000), and we you try to load them, it's not possible.
    I'm looking the best way to remove this special character.
    The server is windows, so that's the first problem: to type "control-M" in any step.
    I did it several months ago with VIM for Windows, with a few files, but now I need to do it with more than 50 files. I've tried now with "sed" from gnuWindows, but I couldn't type the special character in the windows command line.

    Anyone has had this problem? Is it possible to solve it with data validator or other transformation step? How can I type "control-M" in a Windows system?

    Thanks,
    Antonio

  2. #2
    DEinspanjer Guest

    Default

    Isn't the escape code for Ctrl-M either \r or \n?

    Use vim to find out what the unicode escape for it is and use sed, or you could try out the new functions added to the calculator step in Kettle Trunk, it has functions to remove CR or LF.

    You could also use the replace function in the JS step.

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Hi Antonio,

    To replace values, perhaps you can use a bit of JavaScript.
    Have a look at the replace function (right click for samples).
    It should be something like:

    var clean = replace(dirtyString, "\\n", "");

  4. #4
    Join Date
    Feb 2007
    Posts
    27

    Unhappy errors argument type ambiguous?

    I've tried that in "Javascript Modified Value":
    var str1 = texto.getString();
    var str2 = replace(str1,' \\015 ','');
    var str2 = replace (str1,',','.');
    texto.setValue(str2);
    but there are many errors:

    2008/11/26 19:31:03 - Valor Java Script Modificado.0 - ERROR (version 3.1.0, build 826 from 2008/09/30 11:32:36) : Error Javascript:
    2008/11/26 19:31:03 - Valor Java Script Modificado.0 - ERROR (version 3.1.0, build 826 from 2008/09/30 11:32:36) : The choice of Java constructor setValue matching JavaScript argument types (null) is ambiguous; candidate constructors are:
    void setValue(java.util.Date)
    void setValue(org.pentaho.di.compatibility.Value)
    void setValue(java.math.BigDecimal)
    void setValue(java.lang.Boolean)
    void setValue(java.lang.StringBuffer)
    void setValue(byte[])
    void setValue(java.lang.String) (script#4)
    2008/11/26 19:31:03 - Valor Java Script Modificado.0 - ERROR (version 3.1.0, build 826 from 2008/09/30 11:32:36) :
    2008/11/26 19:31:03 - Valor Java Script Modificado.0 - ERROR (version 3.1.0, build 826 from 2008/09/30 11:32:36) : at org.pentaho.di.trans.steps.scriptvalues_mod.ScriptValuesMod.addValues(ScriptValuesMod.java:409)
    2008/11/26 19:31:03 - Valor Java Script Modificado.0 - ERROR (version 3.1.0, build 826 from 2008/09/30 11:32:36) : at org.pentaho.di.trans.steps.scriptvalues_mod.ScriptValuesMod.processRow(ScriptValuesMod.java:640)
    2008/11/26 19:31:03 - Valor Java Script Modificado.0 - ERROR (version 3.1.0, build 826 from 2008/09/30 11:32:36) : at org.pentaho.di.trans.step.BaseStep.runStepThread(BaseStep.java:2664)
    2008/11/26 19:31:03 - Valor Java Script Modificado.0 - ERROR (version 3.1.0, build 826 from 2008/09/30 11:32:36) : at org.pentaho.di.trans.steps.scriptvalues_mod.ScriptValuesMod.run(ScriptValuesMod.java:703)
    2008/11/26 19:31:03 - Valor Java Script Modificado.0 - ERROR (version 3.1.0, build 826 from 2008/09/30 11:32:36) : Caused by: org.mozilla.javascript.EvaluatorException: The choice of Java constructor setValue matching JavaScript argument types (null) is ambiguous; candidate constructors are:
    void setValue(java.util.Date)
    void setValue(org.pentaho.di.compatibility.Value)
    void setValue(java.math.BigDecimal)
    void setValue(java.lang.Boolean)
    void setValue(java.lang.StringBuffer)
    void setValue(byte[])
    void setValue(java.lang.String) (script#4)

    Please, any help?
    Thanks
    Antonio

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Create a new field, don't use setValue please, it's too dangerous.

    It exists for compatibility reasons with the 2.x version only. See also: http://wiki.pentaho.com/display/EAI/...2.5.x+to+3.0.0

  6. #6
    Join Date
    Feb 2007
    Posts
    27

    Default ??

    Matt,

    Excuse me, but:
    setValue with a new (clean/empty) field? If not, I do not understand how to do it.
    Thanks for the "hiper-quick"-reply!
    Antonio

  7. #7
    DEinspanjer Guest

    Default

    No, turn of the "compatibility" flag in the JS step and then have the step output a new field with your cleaned value:

    Code:
    var tmp = replace(texto,' \\015 ','');
    var new_field = replace (tmp,',','.');

  8. #8
    Join Date
    Feb 2007
    Posts
    27

    Default How execute JS?

    One problem: error in "Input File Text" step:
    "DOS format was specified but only a single line feed character was found, not 2" (there is a control-m in a field).
    So the transformation doesn't go to the JS step.
    How could I execute the JS step?
    Antonio

  9. #9
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Use the "CSV Input" step. That one has support for newlines inside of fields.

  10. #10
    Join Date
    Feb 2007
    Posts
    27

    Thumbs up OK!!

    Hi Matt (and DEinspanjer),
    Problem solved!
    Thanks a lot for your help.
    Antonio

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.