Hitachi Vantara Pentaho Community Forums
Results 1 to 11 of 11

Thread: javascript removing whitespace(s) within string

  1. #1

    Unhappy javascript removing whitespace(s) within string

    Hi,

    This one is baffling me. I have a couple of fields in my load field that have multiple spaces between words. Examples of these are

    product:
    Query1ALDINETTE HAIRSPRAY 5LT

    product_code:
    5.4 VOI

    and customer
    HAIR AT ANNARYL C.O.D.

    I need to remove these whitespaces as the inconsistency in this space creates the impression that these names are unique which they are not.

    I have tried using the scripts values / mod object using

    product.replace(/\s*/g, " "); which DOES NOT replace the spaces and
    product.replace("HAIRSPRAY", " ") which DOES replace hairspray with a blank space.

    I have read another thread alluded to comparatbility in this regard. The default function you make available is replace(var,var,var). Is there an issue with my use of the function or is it something else I am doing wrong.

    What else might explain why my regular expression format does not find and replace these spaces?

    Please help.

    THanks
    Tom

  2. #2

    Default

    Hi trljackson,
    Quote Originally Posted by trljackson View Post
    Code:
    product.replace(/\s*/g, " "); //which DOES NOT replace the spaces and 
    product.replace("HAIRSPRAY", " "); // which DOES replace hairspray with a blank space.
    Are the some " missing? Maybe it should look like:
    Code:
    product.replace("/\\s*/g", " ");
    Christoph
    21 is only half the truth

  3. #3

    Default

    Hi Christoph

    Thanks I will try this. There is obviously something in my reg expression as a simple text find replace works without any problems. SHould the reg exp always existing within quotes " "? Having two backslashes \\ before the special character seems odd, is that technically correct?

    Tom

  4. #4

    Default

    Hi Tom,
    have you used regexps before? Do you know quotation? (This was discussed here before.)
    How to write <<He said "Yes.">> in a string which has to be included in "? Special chars are unspecialised with a backslash. So backslash has a specialised meaning. To get a normal backslash in a string u have to use a double backslash.
    "\"" output "
    "\\" output \
    More about at e.g. http://www.javascriptkit.com/jsref/regexp.shtml

    Christoph
    21 is only half the truth

  5. #5

    Default

    Hey Christoph

    Thanks again for feedback. I am familiar with reg exp and from last nights reading Its much clear again having not used it for some time. I am looking to replace whitespaces between text with a single space. I have trim both on the fields at load time so not concerned with those leading and trailing white spaces. Should the reg exp not then be /\s*/g for all whitespaces global and " " to replace with single white space? Should this reg exp be enclosed in brackets like you would if you were searching for a word, specific letters etc i.e. "the"?

    in other words are you saying my javascript should look like:
    Product.replace("/\s*/g", " ");

    or as you put it
    product.replace("/\\s*/g", " ");

    THanks again
    Tom

  6. #6

    Default

    Hi Tom,

    as you are familiar with regexp simply try it... The Javascript step can be tested via "Test script" button and a simple Alert.

    Doing this I found out regexp do not has to be quoted as I thought. So no extra backslashes needed (as in Java where you have to give a regexp as a string variable).
    And I also noted you surely want to replace your * with a +! Otherwise every second character becomes a space.

    Hit "test script" with:
    Code:
    Alert("some\t    words".replace(/\s*/g, " "));
    Alert("some\t    words".replace(/\s+/g, " "));
    BTW: To make the extra backslashes thing clear: To get a regexp as you want in a Java string quoted with " you have to add the extra backslashes. If you write the string to console it would look like you expect the regexp to be. It is just a coding thing.

    Christoph
    21 is only half the truth

  7. #7

    Default

    Great help thanks. Would you write the script as give that product is the database field name in my data stream:

    var str = product.getstring();
    str.replace(/\s+/g, " ");

    OR

    product.replace(/\s+/g, " ");


    given that my data is sitting in MySql 5.1 do I need to give any consideration to Unicode or any other issues of this nature. I say this referring to the regex evalulation object in pentaho and the content table checkboxes more specifically. Is it worth considering this object in my transformation or is it adding unecessary complexity.

    Cheers for now
    Tom

  8. #8
    Join Date
    Apr 2012
    Posts
    1

    Default

    youre forgetting strings are immeasurable so you need to reassingn the var to the result e.g
    Code:
    product = product.replace(/\s*/g, " ")

  9. #9
    Join Date
    Apr 2009
    Posts
    21

    Default

    maybe try looking for space immediately followed by space(s):

    product.replace( /\s\s+/g, " " );


    enjoying PDI
    Thijs Verhagen

  10. #10
    Join Date
    Sep 2007
    Posts
    19

    Default

    Quote Originally Posted by tvrhgn View Post
    maybe try looking for space immediately followed by space(s):

    product.replace( /\s\s+/g, " " );


    enjoying PDI
    Thijs Verhagen
    Worked like a charm, thank you!

  11. #11
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi,
    but why didn't you use Replace in String Step?

    Mick

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.