Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Help with Regex debugging.

  1. #1
    Join Date
    Jun 2009
    Posts
    2

    Default Help with Regex debugging.

    I'm having trouble getting what I need out of the Regex component.
    My regex is currently in use in a java program and works fine in regexbuddy, and other online regex verifiers. Is there some sort of quoting or something that I'm missing?

    Here's the regex, designed to extract a search query from a stored cookie that contains a request string:
    (?:&|\\?)q=(.{1,}?)(?:&|$)

    and here's an example of the cookie string (extracted from a database)

    COOKIE=10.5.16.253.1242557138404155&referrer=http://www.google.fr/search?client=firefox-a&rls=org.mozilla%3Afr%3Aofficial&channel=s&hl=fr&q=quickmusic.com&meta=&btnG=Recherche+Google&behaviorCookie=CONTROL&templateCookie=2_Column_Gradient_Lander&visitorxquickmusic.com=1&navName=Netscape&platform=Win32&brVer=Mozilla/5.0%20%28Windows%3B%20U%3B%20Windows%20NT%205.1%3B%20fr%3B%20rv%3A1.9.0.10%29%20Gecko/2009042316%20Firefox/3.0.10&brVerId=7&brNum=5.0%20%28Windows%3B%20U%3B%20Windows%20NT%205.1%3B%20fr%3B%20rv%3A1.9.0.10%29%20Gecko/2009042316%20Firefox/3.0.10&java=Yes&pv=6&screen=1024%20768&date=1242557314069&hasPops=true&__utma=138349104.246927382800595400.1242557123.1242557123.1242557123.1&__utmb=138349104.10.10.1242557123&__utmc=138349104&__utmz=138349104.1242557123.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=quickmusic.com&

    the desired extraction is the string after q=, in the example it should be quickmusic.com

    I've also extracted the data to a text file to insure there are no funky characterset issues. I'm using capture groups, and I only have one group.
    I've attached a sample transformation that takes a csv of sample data, and transforms and outputs to excel.
    Attached Files Attached Files
    Last edited by osbock; 06-19-2009 at 03:57 PM. Reason: Forgot to attach the datafile

  2. #2
    Join Date
    Jun 2009
    Posts
    2

    Default

    OK, I figured it out, and this really should be documented.
    The Capture groups are not processed unless the entire string matches the regular expression. In the original, it was sufficient to search to the first postion of interest, and match up to the part you wanted, and extract the backreference

  3. #3
    Join Date
    Sep 2009
    Posts
    1

    Default Backreference syntax

    I am trying to use a backreference in a "Replace in String" step. I have a zip code formatted like this:
    123456789.0

    I am using this match:
    ([0-9][0-9][0-9][0-9][0-9])([0-9][0-9][0-9][0-9])\.0

    and am trying to use this replace with backreference:
    \1-\2

    But the result that I get is the literal string "1-2".

    I also tried \\1-\\2, which gave me the same result.

    Is this possible in a "Replace in String" step? Am I doing it wrong?

    Thanks,
    Gary

  4. #4
    Join Date
    Jul 2009
    Posts
    2

    Default

    I still got the same problem, seing only 1 or \1 if I put \1 or \\1 in the Replace With column of Replace In String
    My regexp is trying to get only the section between the < > pair from the text :
    ex: blahblah <myText>
    I used the following Regexp
    ^[^<]*[<]([^>]*)[>]$

    and try to get \1 from this, but so far in vain

    Any help is appreciated
    Thx
    Hung

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.