Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Multiple questions

  1. #1

    Default Multiple questions

    Hi,

    I am trying to use PDI to import products into the DB from a spreadsheet. Have been looking around samples, wiki, forums etc, but couldn't find clear cut answers for the following...

    1. Is there a rule of thumb about an input and output of the transformation steps? e.g. Input, Output, Transformation etc type transformation steps, some seem to work on the input row, but still transmit the (updated) row to the next step, while I think some don't.

    2. How do I loop through the column values in a given row. E.g. I have multiple categories columns that each product belongs to - Need to upsert categories into mysql as part of processing the row and then upsert product itself. I cant figure out, with the steps that PDI provides how to do this best.

    3. Also I do not see a way to get sequence id from non oracle databases - that is a show stopper for me since I dont have auto increment on the table (Dont have control over tables and their schema) and also the keys are not necessarily numeric.

    4. Whats the best way to upsert multiple tables? e.g. as mentioned before the input row for a product contains multiple categories, manufacturer info, prices etc which are all in separate tables.

    5. Is there a guide to writing plugins? I saw a wiki article, but it was a bit confusing for 3.x in terms of whether 2.5 stuff still holds good for 3.X

    Aplogoize if these are really simple questions, will appreciate the pointers if they have been asnswered before.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Less talking, more coding

    Quote Originally Posted by ritesht View Post
    1. Is there a rule of thumb about an input and output of the transformation steps? e.g. Input, Output, Transformation etc type transformation steps, some seem to work on the input row, but still transmit the (updated) row to the next step, while I think some don't.
    what's in a name, a rose by any other name .... input steps take 3rd party things and create PDI rows out of them, output takes rows and creates 3rd party things (updates, inserts, files, ...) ... transformation is the rest + some things like scripting, lookups, ...
    The names of the categories don't matter that much, it's to get kind of a nice grouping of steps.

    Quote Originally Posted by ritesht View Post
    2. How do I loop through the column values in a given row. E.g. I have multiple categories columns that each product belongs to - Need to upsert categories into mysql as part of processing the row and then upsert product itself. I cant figure out, with the steps that PDI provides how to do this best.
    You don't loop through column values usually... if you want to make more rows out of 1 row you usually need javascript (example in the samples directory).

    Quote Originally Posted by ritesht View Post
    3. Also I do not see a way to get sequence id from non oracle databases - that is a show stopper for me since I dont have auto increment on the table (Dont have control over tables and their schema) and also the keys are not necessarily numeric.
    A lot of databases don't support sequences. Support depends on database functionality.

    Quote Originally Posted by ritesht View Post
    4. Whats the best way to upsert multiple tables? e.g. as mentioned before the input row for a product contains multiple categories, manufacturer info, prices etc which are all in separate tables.
    It all depends... if the tables are not related through keys just split the stream (distribute) and handle them all at the same time. If they are related usually some kind of pre-processing is required (or putting unique transactions on e.g.).

    Quote Originally Posted by ritesht View Post
    5. Is there a guide to writing plugins? I saw a wiki article, but it was a bit confusing for 3.x in terms of whether 2.5 stuff still holds good for 3.X
    Less talking, more coding ... you found the basics (not much more is available).... download one of the existing plugins and go ahead... It's not rocket science. But be prepared to spent at least a couple of weeks writing a plugin if you want to make it completely reusable and covering all possible things.

    Regards,
    Sven

  3. #3

    Default

    How about less irrelevant talking and more relevant answering to the posts?
    Quote Originally Posted by sboden View Post
    Less talking, more coding
    Let me just rephrase the question - is it documented somewhere which of the steps actually pass on the input row (modified or otherwise). Seems to me some steps do and some steps dont. Or yet another way to ask would be - which steps are leaf steps that cannot have any follow up steps in the graph?

    Quote Originally Posted by sboden View Post
    what's in a name, a rose by any other name .... input steps take 3rd party things and create PDI rows out of them, output takes rows and creates 3rd party things (updates, inserts, files, ...) ... transformation is the rest + some things like scripting, lookups, ...
    The names of the categories don't matter that much, it's to get kind of a nice grouping of steps.
    Ok - Thanks
    Quote Originally Posted by sboden View Post
    You don't loop through column values usually... if you want to make more rows out of 1 row you usually need javascript (example in the samples directory).
    That exactly was my question. Add Sequence doesnt seem to support any db other than Oracle. My hope was either Add Sequence or any similar step would also take the table name with min 2 cols like sequence name, sequence value to support other dbs or may be allow pluggable java script that returns the value or may be even Insert/Update to take in the script/db table/field name as the primary key as convenience

    Quote Originally Posted by sboden View Post
    A lot of databases don't support sequences. Support depends on database functionality.
    Can you plz elaborate little bit on some preprocessing is required - lets say for a row containing multiple category names, what kind of preprocessing would you do?
    Quote Originally Posted by sboden View Post
    It all depends... if the tables are not related through keys just split the stream (distribute) and handle them all at the same time. If they are related usually some kind of pre-processing is required (or putting unique transactions on e.g.).
    Thanks for the tip
    Quote Originally Posted by sboden View Post
    Less talking, more coding ... you found the basics (not much more is available).... download one of the existing plugins and go ahead... It's not rocket science. But be prepared to spent at least a couple of weeks writing a plugin if you want to make it completely reusable and covering all possible things.

    Regards,
    Sven

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.