Hitachi Vantara Pentaho Community Forums
Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Automatically create (or drop and recreate) a SQL table before exporting to SQL?

  1. #1
    Join Date
    Feb 2018
    Posts
    4

    Default Automatically create (or drop and recreate) a SQL table before exporting to SQL?

    I am running Pentaho Data Integration (Kettle) 7.1.
    I need to export some data to tables in a Microsoft SQL Server database.

    Is there an easy and not-too-convoluted way to automatically create these tables, either as part of the “Table output” step or as a step prior to that, and to drop them and recreate them if they already exist?

    I understand it’s a common question that has been asked in the past, but I am not sure there is a straightforward answer (many discussions I found are quite old – maybe the software has changed in the meanwhile?).

    I appreciate many people feel strongly against the idea; while I do agree that it would typically be a bad idea to do so in a production database, I am using a “scratch” database on my local machine, not on a production server, for some data analyses; in this context it would be extremely useful. The process is very much trial-and-error: get some data, cleanse it, run some analyses, realise I need some changes in the data (e.g. to calculate a new field, or to convert one from one datatype to another), make these changes, re-export to SQL, etc. If I were to manually delete and re-create the table every single time, it would be a pain, and it would defy the point of using an ETL tool in the first place (to save time!). After all, it’s not a coincidence that most other tools let you do this, e.g. Alteryx, or Python and SQL Alchemy (which is not exactly an ETL tool but I have used it for this purpose).

    Thanks!

  2. #2
    Join Date
    Sep 2011
    Posts
    153

    Default

    Yes, you can do it using metadata injection, but there is no one click option for this.
    as you might be deleting some column, adding column , modifying data type , so you have to provide these information somewhere.
    that's all can be done using metadata injection.

  3. #3

    Default

    Don't you just want to truncate the table?

    If you want to make changes to the table, how do you want to do that in ETL? If it doesn't know beforehand which changes you want, you have to define those changes somehow. I have scripts of all the Data Warehouse tables. This script has a DROP TABLE IF EXISTS statement in the beginning. If it's a table in development, I just alter the create part and run the script. It drops the table and all related indexes and you have your fresh new table.

  4. #4
    Join Date
    Feb 2018
    Posts
    4

    Default

    Is there an example of how this would work?

    Quite banally, this is a task I find myself repeating lots and lots of times; as I said, many other tools let you do this seamlessly, so if with Kettle it's too much hassle I'll simply use another tool.

  5. #5
    Join Date
    Apr 2008
    Posts
    4,569

    Default

    Quote Originally Posted by SQLista View Post
    Is there an easy and not-too-convoluted way to automatically create these tables, either as part of the “Table output” step or as a step prior to that, and to drop them and recreate them if they already exist?
    Did you try the SQL button in the Table Output step?

  6. #6
    Join Date
    Feb 2018
    Posts
    4

    Default

    Yes, but AFAIK that's manual, not automated. What I need is a way to automatically export data to a SQL table, so that if the table exists, it gets dropped and then recreated; if it doesn't exist, it gets created from scratch. I need all of this to happen automatically, without me manually running SQL code, checking if a table exists etc. , because the whole point of using an ETL tool is (for me) to save time. If it takes me longer to check for a table/drop it/recreate it than to actually run the transformation that cleanses the data and exports it to SQL, well, it's not a very efficient workflow, it kind of defies the point.

    I appreciate not everyone feels this way and not everyone has the same need. I'm just explaining what my needs are.

  7. #7
    Join Date
    Apr 2008
    Posts
    4,569

    Default

    Except that you are expressing two different phases: System Design and System Execution.

    In the system design phase, things are changing rapidly. This is where you are adding columns, changing the calculations, etc. This is best served with a combination of the "Truncate Table" option, and the SQL button. These are very fast options since you're already in the tool.

    In the System Execution phase, things should remain stable, and thus there is no need to be adding/removing columns frequently. This is best served with the "Truncate Table" option (which empties it out). This is the phase where you want to have the repeatability.

    You also have the option of copying the commands from the SQL button (CREATE TABLE ... ) and putting that into a SQL command step that becomes part of your job (CREATE TABLE ... IF NOT EXIST)

  8. #8

    Default

    In my style of use of Kettle, the SQL button has little use. I usually map the fields from the stream that I need to the columns in the database. The SQL button works when you end up with a stream result exactly in the format of the target table.

    Which tool(s) offer this functionality that you feel you miss? And is/are these tools also free/open source? And how do these tools compare in other areas which make life easier or harder for you as developer? And for administrators?

  9. #9
    Join Date
    Feb 2018
    Posts
    4

    Default

    Quote Originally Posted by gutlez View Post
    Except that you are expressing two different phases: System Design and System Execution.
    Except I never get to what you call the System Execution phase. You have made some incorrect assumptions about my workflow. I am not working in a production database. I have a local Microsoft SQL Server Express server on my machine, which I use as a "scratchpad" database. In summary, every project is pretty much independent and stand-alone; I receive some data from a third-party, cleansing and analysing it. It's a very iterative, trial-and-error process: I receive this data, cleanse the field I need, run some analyses, then maybe realise I need to cleanse 5 or 6 more fields, etc. I do not need most of the fields so cleansing all of them is not the best use of my time. Once a project is completed, the new one will be different.

    Quote Originally Posted by Invisi View Post
    Which tool(s) offer this functionality that you feel you miss? And is/are these tools also free/open source?
    I answered this question in my original post: the ones I have used in the past are Alteryx and Python with the Pandas and SQL Alchemy tools.

    Alteryx is not open source - it's unfortunately quite expensive. I used it a lot in the past and found it very, very efficient for my kind of workflow. I appreciate some may consider Alteryx to be a very different tool to Kettle, but, to be clear, I have zero interest in comparing the two as a whole - I am simply interested in an efficient workflow for what I have to do, which I described above.

    Python with Pandas and SQL Alchemy are open source and free. They are not very efficient when dealing with large amounts of data (Alteryx and its bulk insert are way faster).

    To be clear, with both Alteryx and Python I have the very easy option to export a table to a SQL database, dropping it if it exists, and creating it from scratch otherwise, without worrying about wasting time to manually check if the table exists, if I need to truncate it, if I need to create it declaring every single column manually, etc.

    Quote Originally Posted by Invisi View Post
    And how do these tools compare in other areas which make life easier or harder for you as developer? And for administrators?
    Like I have said, my workflow is what I have described above, so there are no other areas which are relevant to my question. It's not a production database, so your question about administrators does not seem particularly relevant, either.

  10. #10
    Join Date
    Sep 2011
    Posts
    153

    Default

    Quote Originally Posted by SQLista View Post
    Is there an example of how this would work?

    Quite banally, this is a task I find myself repeating lots and lots of times; as I said, many other tools let you do this seamlessly, so if with Kettle it's too much hassle I'll simply use another tool.
    Forum member can guide you to achieve a particular solution, there is a solution for your problem and that can be solve easily with PDI, you can get the table name do a simple validation and use execute sql step to create tables, there is option to generate sql dynamically.
    could you please let us know what are the option you tried?

    and one more thing If you feel that " if with Kettle it's too much hassle I'll simply use another tool" then you are free to use any tool you are good in.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2017 Pentaho Corporation. All Rights Reserved.