Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Questions about web service step

  1. #1
    Sebastien Cesbron Guest

    Default Questions about web service step

    Hi everybody

    The service step is now commited and works on simple cases. It has to be
    improved to handle more complex web services but it is usable.

    Discussing with samatar hassan we have raised some interrogations on basic
    step behaviour. I just want to post this reflexions on the devel mailing
    list to let kettle developers tell me what they think about that

    1 - The step accepts input rows that are used as parameters to call the
    webservice. It also give output rows which are web service output. In my
    point of view input rows and output rows are not correlated. I can give 10
    rows to call 10 times the web service and get 100 rows in output. The
    question is do we have to copy input columns into output rows ? My vision is
    no because there is no relation between the number of input rows and the
    number of output rows. Samatar view this step as a lookup one and he thinks
    that the input columns must be propagated to the output rows : what do you
    think of that ? Do I have to add a checkbox to choose between the two modes
    ? Is there one of these modes which is more kettle friendly ?

    2 - To call the webservice, I just save in the transformation the
    informations I need (operation name, parameter names, output parameters). In
    the step dialog, I need more informations : list of operations available in
    the step, total list of output parameters (you can select which ones you
    want to export). To populate the dialog, I call the web service host to get
    the wsdl when I open the dialog. Thus, this dialog can takes some time to
    open. Samatar thinks it could be better to just call the web service host
    when needed (click the load button or click the get fieds button for output
    fields) : is there any best practice for that ? What do I have to do if I
    want to act as other steps

    3 - There is a changed boolean on the base meta class : is this attribute
    important ? I do not manage it in my step and it seems that the
    transformation editor is not aware of modifications make in my step : am I
    right ?

    I hope you can give me your point of view on these points thus I can improve
    my step accordingly

    Regards


    Seb

    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  2. #2
    Sven Boden Guest

    Default Re: Questions about web service step

    For 1) & 2) I'm "with" Samatar...

    For 1) if there's no possibility to keep the original data and you
    really need the combination of the original data and the "looked up"
    data you're pretty much stuck. I would add for every output row the
    input data that caused it. So if your 10 input rows result in 100 rows
    you would get 100 rows with the original data duplicated a couple of
    times. And then you get these little problems as possible duplicate
    fieldnames.

    For 2) Most steps start retrieving data when you press a button like
    "get fields" or so.

    For 3) the attribute is important, especially if someone would
    completely implement something as "BUG #5321 Unneccesary Model
    Saves" ;-) ... it causes the "(changed)" to appear in the top header.

    Good step by the way.

    Regards,
    Sven

    On Apr 18, 5:21 pm, "Sebastien Cesbron" <scesb... (AT) gmail (DOT) com> wrote:
    > Hi everybody
    >
    > The service step is now commited and works on simple cases. It has to be
    > improved to handle more complex web services but it is usable.
    >
    > Discussing with samatar hassan we have raised some interrogations on basic
    > step behaviour. I just want to post this reflexions on the devel mailing
    > list to let kettle developers tell me what they think about that
    >
    > 1 - The step accepts input rows that are used as parameters to call the
    > webservice. It also give output rows which are web service output. In my
    > point of view input rows and output rows are not correlated. I can give 10
    > rows to call 10 times the web service and get 100 rows in output. The
    > question is do we have to copy input columns into output rows ? My vision is
    > no because there is no relation between the number of input rows and the
    > number of output rows. Samatar view this step as a lookup one and he thinks
    > that the input columns must be propagated to the output rows : what do you
    > think of that ? Do I have to add a checkbox to choose between the two modes
    > ? Is there one of these modes which is more kettle friendly ?
    >
    > 2 - To call the webservice, I just save in the transformation the
    > informations I need (operation name, parameter names, output parameters). In
    > the step dialog, I need more informations : list of operations available in
    > the step, total list of output parameters (you can select which ones you
    > want to export). To populate the dialog, I call the web service host to get
    > the wsdl when I open the dialog. Thus, this dialog can takes some time to
    > open. Samatar thinks it could be better to just call the web service host
    > when needed (click the load button or click the get fieds button for output
    > fields) : is there any best practice for that ? What do I have to do if I
    > want to act as other steps
    >
    > 3 - There is a changed boolean on the base meta class : is this attribute
    > important ? I do not manage it in my step and it seems that the
    > transformation editor is not aware of modifications make in my step : am I
    > right ?
    >
    > I hope you can give me your point of view on these points thus I can improve
    > my step accordingly
    >
    > Regards
    >
    > Seb



    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  3. #3
    Sebastien Cesbron Guest

    Default Re: Questions about web service step

    For 2 I'm ok, I will change the step to only call the webservice when
    necessary.

    For 1, there is different possible scenarii so maybe we have to use a
    checkbox or maybe we have to split the step into different ones.
    Web Service can be used as an input step, an output step or a lookup step.
    Current step do all in one but does not preserve input data in the case of a
    lookup.
    In our company we use this step to import data into the system. I have
    attached an image that shows what we do with the step. We have n rows in
    input. We do 1 call to the webservices with these rows. The webservice try
    to import the data. In our case there is m rows in error so the web service
    return theses rows to kettle as an output of the step. In our case we don't
    want to mix what we give in input to the step and what we get in output. We
    have n rows in, m rows out, with potentially different structure and no way
    to make a link between these rows.

    There is different ways to deal with this :
    a - Divide the step into several other steps with different behaviour
    b - Add a checkbox to choose to keep input data into output data (but
    keeping input data is difficult if the number of rows send in each call is
    higher than 1)
    c - Only copy input rows to output if the call size is 1

    I think c might be the best solution but it can be difficult to undestand.
    Maybe b is the better solution with some automatic check/uncheck
    enable/disable function based on the operation we work on.

    Regards

    Seb

    On 4/18/07, Sven Boden <list123 (AT) pandora (DOT) be> wrote:
    >
    >
    >
    > For 1) & 2) I'm "with" Samatar...
    >
    > For 1) if there's no possibility to keep the original data and you
    > really need the combination of the original data and the "looked up"
    > data you're pretty much stuck. I would add for every output row the
    > input data that caused it. So if your 10 input rows result in 100 rows
    > you would get 100 rows with the original data duplicated a couple of
    > times. And then you get these little problems as possible duplicate
    > fieldnames.
    >
    > For 2) Most steps start retrieving data when you press a button like
    > "get fields" or so.
    >
    > For 3) the attribute is important, especially if someone would
    > completely implement something as "BUG #5321 Unneccesary Model
    > Saves" ;-) ... it causes the "(changed)" to appear in the top header.
    >
    > Good step by the way.
    >
    > Regards,
    > Sven
    >
    > On Apr 18, 5:21 pm, "Sebastien Cesbron" <scesb... (AT) gmail (DOT) com> wrote:
    > > Hi everybody
    > >
    > > The service step is now commited and works on simple cases. It has to be
    > > improved to handle more complex web services but it is usable.
    > >
    > > Discussing with samatar hassan we have raised some interrogations on

    > basic
    > > step behaviour. I just want to post this reflexions on the devel mailing
    > > list to let kettle developers tell me what they think about that
    > >
    > > 1 - The step accepts input rows that are used as parameters to call the
    > > webservice. It also give output rows which are web service output. In my
    > > point of view input rows and output rows are not correlated. I can give

    > 10
    > > rows to call 10 times the web service and get 100 rows in output. The
    > > question is do we have to copy input columns into output rows ? My

    > vision is
    > > no because there is no relation between the number of input rows and the
    > > number of output rows. Samatar view this step as a lookup one and he

    > thinks
    > > that the input columns must be propagated to the output rows : what do

    > you
    > > think of that ? Do I have to add a checkbox to choose between the two

    > modes
    > > ? Is there one of these modes which is more kettle friendly ?
    > >
    > > 2 - To call the webservice, I just save in the transformation the
    > > informations I need (operation name, parameter names, output

    > parameters). In
    > > the step dialog, I need more informations : list of operations

    > available in
    > > the step, total list of output parameters (you can select which ones you
    > > want to export). To populate the dialog, I call the web service host to

    > get
    > > the wsdl when I open the dialog. Thus, this dialog can takes some time

    > to
    > > open. Samatar thinks it could be better to just call the web service

    > host
    > > when needed (click the load button or click the get fieds button for

    > output
    > > fields) : is there any best practice for that ? What do I have to do if

    > I
    > > want to act as other steps
    > >
    > > 3 - There is a changed boolean on the base meta class : is this

    > attribute
    > > important ? I do not manage it in my step and it seems that the
    > > transformation editor is not aware of modifications make in my step : am

    > I
    > > right ?
    > >
    > > I hope you can give me your point of view on these points thus I can

    > improve
    > > my step accordingly
    > >
    > > Regards
    > >
    > > Seb

    >
    >
    > >

    >


    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  4. #4
    Sven Boden Guest

    Default Re: Questions about web service step

    I'd pick b), but I see your point on multiple input rows.

    What I had in mind: as example: you have 10 input rows I1... I10 each
    resulting in 12 output rows: O1... O12.
    For each output row O also include the fields of the input row that
    caused the output.... so for each input row you would get 12 output
    rows, but each of the 12 would include the fields of that 1 input row.
    But this only works of course if you have a single input row.

    So maybe c) with a switch on/off or so.

    Best regards,
    Sven


    > b - Add a checkbox to choose to keep input data into output data (but
    > keeping input data is difficult if the number of rows send in each call is
    > higher than 1)
    > c - Only copy input rows to output if the call size is 1
    >
    > I think c might be the best solution but it can be difficult to undestand.
    > Maybe b is the better solution with some automatic check/uncheck
    > enable/disable function based on the operation we work on.
    >



    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  5. #5
    Tim Pigden Guest

    Default RE: Questions about web service step

    Unless I'm missing some fundamental philosophical point here, is this
    view of a step actually sensible? Why should you expect the inputs for
    any particular operation to be matched directly by the outputs? For
    example, supposed you want to know about the shops which a group of
    people go to. Here you have a many-to-many relationship. Your input is
    people - maybe first name and last name, your output is shop records.
    The number of rows is different, the number of columns is different.
    Copying input rows to output rows is merely a characteristic of a class
    of steps - we just happen to have implemented almost exclusively those
    steps - but even this isn't really true. What about variables?

    Essentially what we are talking about is parameters to operations. These
    parameters could themselves be multi-row (as in the above example). They
    can certainly be multi-column.

    Wouldn't it be more logical to say that any data can or should
    participate in the data graph and that parameters/variables should be
    treated in the same way as other data streams. Then your web service
    might be seen as a form of input - just like a file input, which takes
    parameters. This might be a better way of treating things like lists of
    file names - you have an optional parameter input stream to the file
    reader. Then you could use a directory listing (multi-column to include
    dates and access rights) as a data stream just like any other that you
    could use filters or javascript to manipulate before passing to the csv
    reader. Or you could do the same for a list of names that become
    sequential parameters to a sql query.

    Tim


    -----Original Message-----
    From: kettle-developers (AT) googlegroups (DOT) com
    [mailto:kettle-developers (AT) googlegroups (DOT) com] On Behalf Of Sven Boden
    Sent: 22 April 2007 20:48
    To: kettle-developers
    Subject: Re: Questions about web service step



    I'd pick b), but I see your point on multiple input rows.

    What I had in mind: as example: you have 10 input rows I1... I10 each
    resulting in 12 output rows: O1... O12.
    For each output row O also include the fields of the input row that
    caused the output.... so for each input row you would get 12 output
    rows, but each of the 12 would include the fields of that 1 input row.
    But this only works of course if you have a single input row.

    So maybe c) with a switch on/off or so.

    Best regards,
    Sven


    > b - Add a checkbox to choose to keep input data into output data (but
    > keeping input data is difficult if the number of rows send in each

    call is
    > higher than 1)
    > c - Only copy input rows to output if the call size is 1
    >
    > I think c might be the best solution but it can be difficult to

    undestand.
    > Maybe b is the better solution with some automatic check/uncheck
    > enable/disable function based on the operation we work on.
    >





    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

  6. #6
    Sebastien Cesbron Guest

    Default Re: Questions about web service step

    Hi

    I've attached a patch to the web service step. This patch :
    - use the changed boolean on the meta class
    - Don't load the wsdl when kettle opens the step dialog but just when user
    load operation or ws fields.

    Matt can you apply this patch please.

    I do not make the modification for input rows, I don't have time now to do
    it but I will try to do it in the future

    Regards

    Seb

    On 4/22/07, Tim Pigden <tim.pigden (AT) optrak (DOT) co.uk> wrote:
    >
    >
    > Unless I'm missing some fundamental philosophical point here, is this
    > view of a step actually sensible? Why should you expect the inputs for
    > any particular operation to be matched directly by the outputs? For
    > example, supposed you want to know about the shops which a group of
    > people go to. Here you have a many-to-many relationship. Your input is
    > people - maybe first name and last name, your output is shop records.
    > The number of rows is different, the number of columns is different.
    > Copying input rows to output rows is merely a characteristic of a class
    > of steps - we just happen to have implemented almost exclusively those
    > steps - but even this isn't really true. What about variables?
    >
    > Essentially what we are talking about is parameters to operations. These
    > parameters could themselves be multi-row (as in the above example). They
    > can certainly be multi-column.
    >
    > Wouldn't it be more logical to say that any data can or should
    > participate in the data graph and that parameters/variables should be
    > treated in the same way as other data streams. Then your web service
    > might be seen as a form of input - just like a file input, which takes
    > parameters. This might be a better way of treating things like lists of
    > file names - you have an optional parameter input stream to the file
    > reader. Then you could use a directory listing (multi-column to include
    > dates and access rights) as a data stream just like any other that you
    > could use filters or javascript to manipulate before passing to the csv
    > reader. Or you could do the same for a list of names that become
    > sequential parameters to a sql query.
    >
    > Tim
    >
    >
    > -----Original Message-----
    > From: kettle-developers (AT) googlegroups (DOT) com
    > [mailto:kettle-developers (AT) googlegroups (DOT) com] On Behalf Of Sven Boden
    > Sent: 22 April 2007 20:48
    > To: kettle-developers
    > Subject: Re: Questions about web service step
    >
    >
    >
    > I'd pick b), but I see your point on multiple input rows.
    >
    > What I had in mind: as example: you have 10 input rows I1... I10 each
    > resulting in 12 output rows: O1... O12.
    > For each output row O also include the fields of the input row that
    > caused the output.... so for each input row you would get 12 output
    > rows, but each of the 12 would include the fields of that 1 input row.
    > But this only works of course if you have a single input row.
    >
    > So maybe c) with a switch on/off or so.
    >
    > Best regards,
    > Sven
    >
    >
    > > b - Add a checkbox to choose to keep input data into output data (but
    > > keeping input data is difficult if the number of rows send in each

    > call is
    > > higher than 1)
    > > c - Only copy input rows to output if the call size is 1
    > >
    > > I think c might be the best solution but it can be difficult to

    > undestand.
    > > Maybe b is the better solution with some automatic check/uncheck
    > > enable/disable function based on the operation we work on.
    > >

    >
    >
    >
    >
    > >

    >


    --~--~---------~--~----~------------~-------~--~----~
    You received this message because you are subscribed to the Google Groups "kettle-developers" group.
    To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
    To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
    For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
    -~----------~----~----~----~------~----~------~--~---

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.