Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: New Regex Capture step almost ready

  1. #1
    DEinspanjer Guest

    Default New Regex Capture step almost ready

    Finishing up some touches on it tonight and writing out the unit tests
    for it.

    Would you prefer I submit this as a patch to make it a base step in
    trunk or as a plugin?

    It does everything that the current Regex Eval step does, but it also
    can parse out capture groups into new fields. Unfortunately, because
    it adds new fields to the stream, there is no way to make it a
    backwards compatible update to the existing regex step, hence the new
    name. :/
    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  2. #2
    Sven Boden Guest

    Default Re: New Regex Capture step almost ready

    Preferably a backwards compatible update (with some switch or so). If
    it's not backwards compatible I would propose to put it in as a
    plugin. Having similar default steps available confuses people a lot.

    my 2c,
    Sven


    On Mar 28, 5:32 am, DEinspanjer <deinspan... (AT) gmail (DOT) com> wrote:
    > Finishing up some touches on it tonight and writing out the unit tests
    > for it.
    >
    > Would you prefer I submit this as a patch to make it a base step in
    > trunk or as a plugin?
    >
    > It does everything that the current Regex Eval step does, but it also
    > can parse out capture groups into new fields. Unfortunately, because
    > it adds new fields to the stream, there is no way to make it a
    > backwards compatible update to the existing regex step, hence the new
    > name. :/

    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  3. #3
    DEinspanjer Guest

    Default Re: New Regex Capture step almost ready

    So as I said, the new step has a fields list at the bottom where you
    can capture values out of the regex and create new fields with them.

    As it stands right now, the step can understand everything that
    RegexEval does. If I learned a little hackery to make it read
    regexeval step information then it could take the place of that.
    The problem is that if you define an instance of RegexCapture that
    creates new fields and then you try to open that transformation in the
    older version, those fields will be lost because RegexEval doesn't
    understand them and hence the whole transformation will be invalid. I
    don't see a good way around that.

    If I would have needed this step before 3.0 went out the door then I
    would have been happy to have written it and RegexEval would never
    have existed in its current form. :/
    I personally feel that RegexEval is only half a step without being
    able to take advantage of capture groups.

    Daniel

    On Mar 28, 12:41

  4. #4
    Sven Boden Guest

    Default Re: New Regex Capture step almost ready

    Well, learn a little "hackery" then or put it in as plugin ;-). For
    compatibility it would be something as:

    - Have a switch in the new RegexEval to be able to use new
    functionality or not (default not)
    - take the id of the "old" RegexEval
    - Be able to read old format and add the stuff you need for the new
    format. If you don't switch on the new functionality it's backwards
    compatible, with new functionality switched on it's not of course.

    I would only put in new steps/jobs in the trunk/default installation
    if a lot of people need the functionality, and avoid having steps with
    duplicate/derived functionality.
    For the first ... e.g. RSS input/output is nice to have if you need
    it, but 99% will never use in their ETL. For the second in the past
    "JavaScript" and "JavaScript Mod" confused a lot of people while the
    Mod version could have just been a change to the original JavaScript
    step (which was corrected later on ;-) ).

    As for plugins, the more the merrier I think.

    Best Regards,
    Sven


    On 29 mrt, 05:43, DEinspanjer <deinspan... (AT) gmail (DOT) com> wrote:[color=blue]
    > So as I said, the new step has a fields list at the bottom where you
    > can capture values out of the regex and create new fields with them.
    >
    > As it stands right now, the step can understand everything that
    > RegexEval does.

  5. #5
    DEinspanjer Guest

    Default Re: New Regex Capture step almost ready

    If the developer unchecks the copatibility checkbox, would you expect
    the plugin to write out its XML or Repository information using a new
    name so that it cannot be corrupted if opened by the replaced version?

    On Mar 29, 2:52

  6. #6
    Sven Boden Guest

    Default Re: New Regex Capture step almost ready

    It depends, how it's usually done is that the new functionality is put
    into extra fields/sections in xml/repository (and by default the new
    functionality is off). The trick is that most of the steps only read
    the configuration items they find and don't complain if they see more.

    So how it usually works with new functonality:
    - If you take a transformation of a previous version and run it in the
    new (without saving) it should do the same as in the previous version

    - If you take a transformation of a previous version and save it in
    the new, the extra sections will be added. If you then go back to the
    previous version the newly save transformation will still work ok. But
    if you really use the new functionality and save it all bets are off
    of running it correctly in a previous version (which is ok since
    someone manually changed the transformation).

    So keeping the same name. Fully forwards compatible, backwards
    compatible if you don't use new functionality.

    The way I see it and like it... If I upgrade I expect my jobs/
    transformations to do the same as in the previous version. Right after
    upgrading I still expect to be able to roll back to a previous
    version. But if I start using new functionality I understand I can't
    use it in an old version ;-)

    Regards,
    Sven


    On Mar 30, 2:37

  7. #7
    DEinspanjer Guest

    Default Re: New Regex Capture step almost ready

    This sounds very doable. I just have to work the compatible flag into
    it.

    On Mar 30, 4:12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.