View Full Version : What’s new in 4.2.0

06-17-2011, 02:30 PM
Dear Kettle fans,

Instead of pointing to the impressive list of changes in JIRA (http://jira.pentaho.com/secure/ReleaseNote.jspa?projectId=10062&version=11114) I took the time out to build a high level overview of all the new big ticket items that are going to be in the upcoming version 4.2 of Kettle (Pentaho Data Integration). Allow me to share it with you…:

The Excel Writer step offers advanced Excel output functionality to control the look and feel of your spreadsheets.
Graphical performance and progress feedback for transformations
The Google Analytics step allows download of statistics from your Google analytics account
The Pentaho Reporting Output step makes it possible for you to run your (parameterized) Pentaho reports in a transformation. It allows for easy report bursting of personalized reports.
The Automatic Documentation step generates (simple) documentation of your transformations and jobs using the Pentaho Reporting API.
The Get repository names step retrieves job and transformation information from your repositories.
The LDAP Writer step
The Ingres VectorWise (streaming) bulk loader step
The Greenplumb (streaming) bulk loader step (for gpload)
The Talend Job Execution job entry
Healthcare Level 7 : HL7 Input step, HL7 MLLP Input and HL7 MLLP Acknowledge job entries
The PGP File Encryption, Decryption & validation job entries facilitate encryption and decryption of files using PGP.
The Single Threader step for parallel performance tuning of large transformations
Allow a job to be started at a job entry of your choice (continue after fixing an error)
The MongoDB Input step (including authentication)
The ElasticSearch bulk loader
The XML Input Stream (StAX) step to read huge XML files at optimal performance and flat memory usage by flattening the structure of the data.
The Get ID from Slave Server step allows multi-host or clustered transformations to get globally unique integer IDs from a slave server: http://wiki.pentaho.com/display/EAI/Get+ID+from+Slave+Server
Carte improvements:
reserve next value range from a slave sequence service
allow parallel (simultaneous) runs of clustered transformations
list (reserved and free) socket reservations service
new options in XML for configuring slave sequences
allow time-out of stale objects using environment variable KETTLE_CARTE_OBJECT_TIMEOUT_MINUTES

Memory tuning of logging back-end with: KETTLE_MAX_LOGGING_REGISTRY_SIZE, KETTLE_MAX_JOB_ENTRIES_LOGGED, KETTLE_MAX_JOB_TRACKER_SIZE allowing for flat memory usage for never ending ETL in general and jobs specifically.
Repository Import/Export
Export at the repository folder level
Export and Import with optional rule-based validations
Import command line utility allow for rule-based (optional) import of lists of transformations, jobs and repository export files: http://wiki.pentaho.com/display/EAI/Import+User+Documentation

ETL Metadata Injection:
Retrieval of rows of data from a step to the “metadata injection” step
Support for injection into the “Excel Input” step
Support for injection into the “Row normaliser” step
Support for injection into the “Row Denormaliser” step

The Multiway Merge Join step (experimental) allows for any number of data sources to be joined using one or more keys using an inner or a full outer join algorithm.
Beyond this list there’s as mentioned a long list of bug fixes and small improvements to the various steps and job entries. It’s impossible to thank the complete community for all the contributions they’ve made to make this release a smashing success. If you think it feels more like a 5.0 version please remember that we’re pretty conservative about version numbering. As long as we don’t break our own Java API we won’t go to another major version.

Also remember you can try out all these new features right now by using a CI build (http://ci.pentaho.com/job/Kettle/) or once the RC1 build is posted on SourceForge (http://sourceforge.net/projects/pentaho/files/) later on. Please help our QA team by posting any issues you might find in JIRA (http://jira.pentaho.com/browse/PDI).

Last but certainly not least let’s not forget to mention the upcoming exciting features of the new Pentaho BI Server version 4. I won’t spoil the surprise for you but I can tell you that certain things in that new release are looking really (really!) nice. Next Thursday (Europe – 13:00 GMT/UTC, 9:00am EST, Americas – 1:00pm EST, 10:00am PST) you can join us for a web conference with live demo. Please register here (http://www.pentaho.com/events/pentaho-bi-4/) if you are interested.

Have fun with the new Pentaho software releases!


More... (http://www.ibridge.be/?p=203)