Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Question on Kettle (Trying to find if Kettle is the right product for this job)

  1. #1
    Join Date
    Jan 2014
    Posts
    10

    Default Question on Kettle (Trying to find if Kettle is the right product for this job)

    Hello,

    We have two databases - a MySQL source research database and a PostgreSQL bioinformatics tracking database. Right now, we have two difficult to maintain pipelines in Perl and Python that extracts and transforms & loads from source db to bioinformatics db respectively.

    (The data only moves one way: from research DB to bioinformatics DB)

    While I have a novice data warehousing background previously and have used SSIS for something similar to this (albeit larger in scale) waaay back, I'm not sure that the "big" tools like SSIS / Informatica would be available to us.


    Here are some more details:

    1) Ideally, we'd want the software to be capable of running on a server of some kind - so that the kettle server can connect to the two database servers and do the ETL.
    2) The ETL processes would need to be run in cron style -- aka constantly carrying out the pipelines we set up to synchronize the data between the two servers. We'd like it to run constantly but every half an hour, every hour, etc is also an option.
    3) We would need some way of mapping columns from one side (tables in source DB) to columns on the other side.
    4) It would need to be capable of reading from MySQL / MariaDB and writing to PostgreSQL.

    Would Kettle be able to do this? I know that SSIS (if I remember correctly) was able to do everything we need, but we'd like to pursue other options first.

  2. #2
    Join Date
    Oct 2007
    Posts
    107

    Default

    Hello Speedy,

    Yes Kettle is able to answers all of your questions. To be totally honnest with you, I'm an SSIS consultant since 2005 and I have been working with all versions and you know what, to me Pentaho Data Integration is far superior ....
    In some case it might be a question of preferences but I have so much frustration with SSIS right now with bugs and lack of possibilities (I know I can do everything but in a lot of case, it's much easier with Kettle). I can't wait for the product to be more known in Canada so I can switch my career.

    Regards,
    C.
    Last edited by CHamel; 01-31-2014 at 06:31 AM. Reason: Damn autocorrect.....

  3. #3
    Join Date
    Dec 2010
    Posts
    193

    Default

    Well Said, CHamel. Speedy , kettle will be capable of all things you said and you will see a nice improvement in performance too, I bet. New kettle version 5.0 is too good and clasy with much more scope. Cheers
    Last edited by satran; 01-31-2014 at 12:09 AM.
    Sathish
    Back to Pentaho


    'Be the best Pearl in the ocean of wisdom'

  4. #4
    Join Date
    Jan 2014
    Posts
    10

    Default

    Thank you CHamel and satran.

    EDIT:

    From what I remember from using SSIS, I was able to perform SQL operations as well as some other transformations (concatenating strings, whathaveyou). Kettle should be able to do this, correct?

    Also, from what I am reading on places, Kettle seems to be a very heavy GUI based client? We use CentOS machines here for almost everything - how would we go on setting Kettle up on a server and run things there? Asking since it doesn't look like we're going to be able to simply ssh into the server and set up kettle in the shell...
    Last edited by speedy; 01-31-2014 at 01:47 PM.

  5. #5

    Default

    Quote Originally Posted by speedy View Post
    From what I remember from using SSIS, I was able to perform SQL operations as well as some other transformations (concatenating strings, whathaveyou). Kettle should be able to do this, correct?

    Also, from what I am reading on places, Kettle seems to be a very heavy GUI based client? We use CentOS machines here for almost everything - how would we go on setting Kettle up on a server and run things there? Asking since it doesn't look like we're going to be able to simply ssh into the server and set up kettle in the shell...
    Yes, kettle has an 'SQL' step that allows you to execute any SQL script in a step at the beginning of a transformation.

    We run Kettle on an AWS server running centos, we access it via a gui and via ssh, we use ssh to trigger jobs for us. However, you need to use the GUI to build your jobs and to test them, in theory you could write the jobs up as XML but really you'd need to have your head looked at if opted for that.

    We often do development on a smaller, local version of our data and then upload the transformations into the server and run them there.

    Good luck

  6. #6
    Join Date
    Oct 2007
    Posts
    107

    Default

    Quote Originally Posted by speedy View Post
    Thank you CHamel and satran.

    From what I remember from using SSIS, I was able to perform SQL operations as well as some other transformations (concatenating strings, whathaveyou). Kettle should be able to do this, correct?
    Concatenating strings is so nothing You can move mountains with Kettle. you can extract data from almost any database type (SQL Server, Oracle, MySQL, Postgres, MongoDB etc.... can't enumerate all of them, there's so many...), Flat file, XML file, Excel (funny, Kettle is way much better to deal with Excel than SSIS which are both Microsoft products...!). As for the GUI, I don't find it much heavier than Visual Studio, even more in the latest version 5.0.1 which is way faster to load than previous version. I've tried Talend and gave up the tool because it was way too heavy. But at the end my friend, nothing like trying it out by yourself, take some time to read documentation/wikis and you'll be convince. There's nothing I can say that can make you accept the product if you don't try it.

  7. #7
    Join Date
    Jan 2014
    Posts
    10

    Default

    EDIT: Looks like I fixed those errors for the most part.

    Woodbine you might know the answer -- how did you guys install kettle?

    What I had in mind is this: each of us (there will be 3 of us doing this here) needs access to Kettle. So we install Kettle on a central server that has the jobs set up there, and we run it via ssh like cron so that it constantly syncs up the source DB with the bioinformatics DB.

    But the thing is we would each need to be able to maintain the jobs on the central server - would we each just have to remote desktop into the server and access the server kettle's gui that way? Or is there a way to have each one of us install a copy of Kettle on our desktop linux machines and then use our copy as a client to access the jobs on the Kettle on the server machine?
    Last edited by speedy; 01-31-2014 at 07:11 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.