Hitachi Vantara Pentaho Community Forums
Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: Development: java vs spoon

  1. #1
    Join Date
    Aug 2016
    Posts
    289

    Default Development: java vs spoon

    I wondered if anyone with software developer background had some thoughts about using java (pure code) vs spoon. And also the possibility to use kettle library in java code instead of spoon.

    In my point of view, spoon is some times better, specially for easier tasks and it gives quick 'big picture' understanding of what is happening. It is also fairly fast to get things done, specially connections to other resources (databases, web and more) works without problems (usually).

    Spoon really is an abstraction layer above the programming language. All class structures are already finished and hidden from the user (inheritance, interfaces etc). What I miss in spoon is handling data in more detail, specially between jobs and transformations. While spoon can handle single value arguments and a single list (result rows), java can handle any number of arguments (multiple result rows, lists as arguments etc). Java also has strict types so you will hardly be in a situation where a string is confused with an int, while spoon is more of a scripting language without this safety (except between steps in a transformation). I also miss the opportunity to use basic object oriented principles, instead having to re-use code with the help of arguments.

    A huge bonus in spoon is the ability to use custom java code in steps. This works well for smaller pieces of code, but the text editor in spoon for java code is nothing compared to a fully java development environment (eclipse etc). If you want a larger java code base in spoon, you could of course import it as a library, but then you would need to edit that separately (outside spoon).

    Do you have any thoughts, maybe tips to better fill the gaps between spoon and java programming? Have you tried using the kettle library directly in java code?

  2. #2

    Default

    In most cases where I've seen a company embed Kettle libraries into a Java application, they tend to use the Java application as a scheduler / orchestrator for an existing legacy enterprise application, of which PDI is a small component. Typically they just call the methods to start a job or transformation on a Carte server.

    If instead they're trying to manipulate/enrich the data itself as part of a transformation, there's a lot lower barrier-of-entry by just writing a plugin with the necessary logic than to do a full-on embedding of Kettle inside of a Java application.

    I'm not sure I understand where you're thinking PDI is a scripting language. It's a compiled Java application. PDI exposes various configuration settings through steps and job entries, allowing end-users to customize how it processes the data as it is passed through various steps. There are some steps that include a scripting engine (e.g. Modified Java Script Value), but the entire underlying application is a compiled application, which means Java data types are enforced.

    Hope that helps

  3. #3

    Default

    If you are a coder, rebuild all yourself. When you want an understandable and optimised solution, including its limitations, take a tool. For me I wouldn't want to code the more complex steps. I find tools to be greatly superiour also in administration. Otherwise feel free to do the same that Matt did.

  4. #4
    Join Date
    Aug 2016
    Posts
    289

    Default

    Thanks for sharing your thoughts!

    My original thought was "is Spoon really the best tool here, or would I be better off making my own java program from scratch?". This applies to solving a standard or comoplex ETL problem like read from some source (file, database, web etc) transform it in some way and write the results to database (statistics in my case). For the easier tasks, I think Spoon is actually superior to normal Java code. But for the more complicated problems, I'm not so sure so I sometimes ask myself if this is the reight tool or not. So far it has been, but new challenges occur periodically.

    Having worked a couple of years with Spoon and having some background with purely java (mostly educational/training), I found these differences:

    Spoon pros:
    -fast development of basic functionality
    -simple handling of events (success/failure)
    -easy connections to external sources (database, web, file ++)
    -scripted, no compile time
    -can take custom java code and additional java libraries (jars)
    -rich functionality already included (steps)
    -class structure is already defined, optimized and hidden (interface, inheritance, design patterns)

    Java pros:
    -better debug and testing environment
    -more freedom to pass complex data structures like arrays as variables, objects
    -object oriented
    -type checking, more security and alerts if you have miss-match

  5. #5

    Default

    I can't comment on the debug and testing part as I am not a Java developer. As for object oriented, I see no added value for working with data to fill data warehouses. I think from the problem I have to solve. PDI is an optimised tool that solves the issues you face when filling a data warehouse. When it falls short in one or more of those issues, I first ask myself if I think from the perspective of the ETL sub components, or as a software developer or a SQL developer / user.

    You are Always welcome to add certain functionalities you feel are missing from PDI.

  6. #6
    Join Date
    Aug 2016
    Posts
    289

    Default

    When I'm making multiple transformations for multiple fact tables that share certain similarities, I some times think object oriented principles could be nice. I'd like to use interface, inheritance etc. Instead, I'm often duplicating code to do the same thing multiple times. On the other hand, there already is good use of design patterns below the kettle layer, so it's nice to have that optimized and out of the way.

  7. #7

    Default

    Which similarities do they share?

  8. #8
    Join Date
    Aug 2016
    Posts
    289

    Default

    They will typically share a number of dimensions (time, date, source, destination). Then they will each have a number of unique dimensions. This would be a classic case of inheritance in java, but so far I've just made a bunch of duplicated steps in transformations.

  9. #9
    Join Date
    Apr 2008
    Posts
    4,690

    Default

    Very few users here have written Kettle transforms in Java.
    One of the big bonus points of using PDI is that you *don't* need to be a coder to build and maintain the scripts.

    Building internal company templates (like you would for MS Word!) is how you can get around a lot of your perceived shortcomings.
    I haven't written an java code in over 10 years, and yet, I can troubleshoot a PDI transformation quite well.

  10. #10
    Join Date
    Aug 2016
    Posts
    289

    Default

    With templates, you mean to use search and replace to insert custom values? I've used that before. It is useful for making the transformations, but if you are going to change the transformations later, you still need to go into each and every one and apply the same change. You would get duplicate code after using templates? That's the magic with object oriented programming and inheritance, if you need to change something, you only change it once and it applies to all.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.