Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: What's in a (field) name

  1. #1
    Matt Casters Guest

    Default What's in a (field) name

    Dear devs,

    We have an oppertunity to do a few things differently in 3.0.
    Since we're completely splitting off row metadata from the data, we can also
    do field renaming and the like without much of a performance loss. (if any)

    In the past the addition of a field with the same name caused somewhat of a
    problem because only the first could be addressed properly.
    We now can do three things in Kettle 3.x :

    1) Add the field with the same name a second (3rd, 4th, ...) time : leave it
    as it is.
    2) Replace the field at the same location
    3) Rename field "Name" to "Name2", "Name3", "Name4", etc.

    My favorite is the last option as this one would work in all circumstances
    and it would be the easiest to do and indeed cause no performance loss.
    It would work just like before, but you would have an error-free solution.

    Let me know what your thoughs are on this subject.

    All the best,

    Matt
    ____________________________________________
    Matt Casters, Chief Data Integration
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org <http://www.pentaho.org/> -- mcasters (AT) pentaho (DOT) org
    Tel. +32 (0) 486 97 29 37



    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  2. #2
    Biswapesh Chattopadhyay Guest

    Default Re: What's in a (field) name

    I think the 3rd option is nice. One interesting scenario (based on prior
    experience) is how do you elegantly handle self join situation (e.g. when
    you want to find all possible combinations of a set of data), So, if you
    have a file with fields: {A, B, C} and you join it with itself, the result
    set should look like {A, B, C, A1, B1, C1}. Generalizing this, we would get
    N X {A} => {A, A1, A2 ... A(N-1)}

    HTH

    Rgds,
    Biswa.

    On 25/05/07, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:
    >
    > Dear devs,
    >
    > We have an oppertunity to do a few things differently in 3.0.
    > Since we're completely splitting off row metadata from the data, we can
    > also do field renaming and the like without much of a performance loss. (if
    > any)
    >
    > In the past the addition of a field with the same name caused somewhat of
    > a problem because only the first could be addressed properly.
    > We now can do three things in Kettle 3.x :
    >
    > 1) Add the field with the same name a second (3rd, 4th, ...) time : leave
    > it as it is.
    > 2) Replace the field at the same location
    > 3) Rename field "Name" to "Name2", "Name3", "Name4", etc.
    >
    > My favorite is the last option as this one would work in all circumstances
    > and it would be the easiest to do and indeed cause no performance loss.
    > It would work just like before, but you would have an error-free solution.
    >
    > Let me know what your thoughs are on this subject.
    >
    > All the best,
    >
    > Matt
    > ____________________________________________
    > Matt Casters, Chief Data Integration
    > Pentaho, Open Source Business Intelligence
    > http://www.pentaho.org -- mcasters (AT) pentaho (DOT) org
    > Tel. +32 (0) 486 97 29 37
    >
    >
    > >

    >


    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  3. #3
    Matt Casters Guest

    Default RE: What's in a (field) name

    Exactly. In fact I'm porting the JoinRows step right now and it just hit me
    we could do away with all the "same fieldname" problems.

    Cheers,
    Matt


    _____

    From: kettle-developers (AT) googlegroups (DOT) com
    [mailto:kettle-developers (AT) googlegroups (DOT) com] On Behalf Of Biswapesh
    Chattopadhyay
    Sent: Friday, May 25, 2007 4:34 PM
    To: kettle-developers (AT) googlegroups (DOT) com
    Subject: Re: What's in a (field) name


    I think the 3rd option is nice. One interesting scenario (based on prior
    experience) is how do you elegantly handle self join situation (e.g. when
    you want to find all possible combinations of a set of data), So, if you
    have a file with fields: {A, B, C} and you join it with itself, the result
    set should look like {A, B, C, A1, B1, C1}. Generalizing this, we would get
    N X {A} => {A, A1, A2 ... A(N-1)}

    HTH

    Rgds,
    Biswa.


    On 25/05/07, Matt Casters <mcasters (AT) pentaho (DOT) org> wrote:

    Dear devs,

    We have an oppertunity to do a few things differently in 3.0.
    Since we're completely splitting off row metadata from the data, we can also
    do field renaming and the like without much of a performance loss. (if any)

    In the past the addition of a field with the same name caused somewhat of a
    problem because only the first could be addressed properly.
    We now can do three things in Kettle 3.x :

    1) Add the field with the same name a second (3rd, 4th, ...) time : leave it
    as it is.
    2) Replace the field at the same location
    3) Rename field "Name" to "Name2", "Name3", "Name4", etc.

    My favorite is the last option as this one would work in all circumstances
    and it would be the easiest to do and indeed cause no performance loss.
    It would work just like before, but you would have an error-free solution.

    Let me know what your thoughs are on this subject.


    All the best,

    Matt
    ____________________________________________
    Matt Casters, Chief Data Integration
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org <http://www.pentaho.org/> -- mcasters (AT) pentaho (DOT) org
    Tel. +32 (0) 486 97 29 37










    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  4. #4
    Roland Bouman Guest

    Default Re: What's in a (field) name

    Hi Matt,

    > My favorite is the last option as this one would work in all circumstances
    > and it would be the easiest to do and indeed cause no performance loss.
    > It would work just like before, but you would have an error-free solution.


    Excellent idea! Maybe choose a notation like [num]

    Name[1], Name[2] etc to avoid clashes with 'real' fieldnames Name1,
    i.e. prevent Name1 from becoming Name11.

    > Let me know what your thoughs are on this subject.
    >
    > All the best,
    >
    > Matt
    > ____________________________________________
    > Matt Casters, Chief Data Integration
    > Pentaho, Open Source Business Intelligence
    > http://www.pentaho.org -- mcasters (AT) pentaho (DOT) org
    > Tel. +32 (0) 486 97 29 37
    >
    >
    > >

    >



    --
    Roland Bouman

    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  5. #5
    Matt Casters Guest

    Default RE: What's in a (field) name

    OK, another good idea.
    The change will be done in RowMeta.mergeRow(RowMetaInterface rowMeta);

    All the best,

    Matt


    -----Original Message-----
    From: kettle-developers (AT) googlegroups (DOT) com
    [mailto:kettle-developers (AT) googlegroups (DOT) com] On Behalf Of Roland Bouman
    Sent: Friday, May 25, 2007 4:34 PM
    To: kettle-developers (AT) googlegroups (DOT) com
    Subject: Re: What's in a (field) name


    Hi Matt,

    > My favorite is the last option as this one would work in all
    > circumstances and it would be the easiest to do and indeed cause no

    performance loss.
    > It would work just like before, but you would have an error-free solution.


    Excellent idea! Maybe choose a notation like [num]

    Name[1], Name[2] etc to avoid clashes with 'real' fieldnames Name1, i.e.
    prevent Name1 from becoming Name11.

    > Let me know what your thoughs are on this subject.
    >
    > All the best,
    >
    > Matt
    > ____________________________________________
    > Matt Casters, Chief Data Integration
    > Pentaho, Open Source Business Intelligence http://www.pentaho.org --
    > mcasters (AT) pentaho (DOT) org Tel. +32 (0) 486 97 29 37
    >
    >
    > >

    >



    --
    Roland Bouman




    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  6. #6
    Nick Goodman Guest

    Default Re: What's in a (field) name

    >
    > Excellent idea! Maybe choose a notation like [num]
    >
    > Name[1], Name[2] etc to avoid clashes with 'real' fieldnames Name1,
    > i.e.
    > prevent Name1 from becoming Name11.


    I like that idea, but there is a downside. Adding the brackets will
    almost certainly require renaming later on if you're using the "table
    output" operator. The brackets are invalid column names in most
    databases. Instead of brackets (which are certainly most readable in
    Kettle) perhaps underscores?

    mysql> create table table1 ( name1[1] int);
    ERROR 1064 (42000): You have an error in your SQL syntax; check the
    manual that corresponds to your MySQL server version for the right
    syntax to use near '[1] int)' at line 1
    mysql> create table table1 ( name1_1 int);
    Query OK, 0 rows affected (0.27 sec)

    I'd just hate to put field names into the stream when we think
    there's a good chance we'll have to rename them later along.

    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  7. #7
    Roland Bouman Guest

    Default Re: What's in a (field) name

    Hi Nick,

    > > Name[1], Name[2] etc to avoid clashes with 'real' fieldnames Name1,
    > > i.e.
    > > prevent Name1 from becoming Name11.

    >
    > I like that idea, but there is a downside. Adding the brackets will
    > almost certainly require renaming later on if you're using the "table
    > output" operator. The brackets are invalid column names in most
    > databases. Instead of brackets (which are certainly most readable in
    > Kettle) perhaps underscores?


    You're right - good point. So, postfix: _<num>

    --
    Roland Bouman

    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  8. #8
    Matt Casters Guest

    Default RE: What's in a (field) name

    Nick is right as well, an underscore it will be then.

    Please note that Kettle has no problems with the creation or usage of
    fieldnames with "[1]" in it.
    The problems on the outflow side will be many though ;-)

    -----Original Message-----
    From: kettle-developers (AT) googlegroups (DOT) com
    [mailto:kettle-developers (AT) googlegroups (DOT) com] On Behalf Of Nick Goodman
    Sent: Friday, May 25, 2007 5:33 PM
    To: kettle-developers (AT) googlegroups (DOT) com
    Subject: Re: What's in a (field) name


    >
    > Excellent idea! Maybe choose a notation like [num]
    >
    > Name[1], Name[2] etc to avoid clashes with 'real' fieldnames Name1,
    > i.e.
    > prevent Name1 from becoming Name11.


    I like that idea, but there is a downside. Adding the brackets will almost
    certainly require renaming later on if you're using the "table output"
    operator. The brackets are invalid column names in most databases. Instead
    of brackets (which are certainly most readable in
    Kettle) perhaps underscores?

    mysql> create table table1 ( name1[1] int);
    ERROR 1064 (42000): You have an error in your SQL syntax; check the manual
    that corresponds to your MySQL server version for the right syntax to use
    near '[1] int)' at line 1
    mysql> create table table1 ( name1_1 int);
    Query OK, 0 rows affected (0.27 sec)

    I'd just hate to put field names into the stream when we think there's a
    good chance we'll have to rename them later along.




    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  9. #9
    Nick Goodman Guest

    Default Re: What's in a (field) name

    On May 25, 2007, at 8:41 AM, Matt Casters wrote:

    > Please note that Kettle has no problems with the creation or usage of
    > fieldnames with "[1]" in it.
    > The problems on the outflow side will be many though ;-)


    Noted.

    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.