I've come to accept my inefficiency on keeping up with the technical blog posts. This is the point where one accepts his complete uselessness (and I don't even know if this is a real word!)

Anyway - up to the good things:

Pentaho 8.2 is available!

Get it here!


A really really solid release! Not a lot of eye catching new buzzwords, but a huge list of things that will make a serious impact on the development effort and production releases out there.


Release overview

Here's the release at a glimpse:

  • Enhance Eco System Integration
    • HCP Connector I
    • MapR DB Support
    • Google Encryption Support

  • Improve Edge to Cloud Processing
    • Enhanced AEL
    • Streaming AMQP

  • Better Data Operation
    • Expanded Lineage
    • Status Monitoring UX
    • OpenJDK support

  • Enable Data Science & Visualization
    • Python Executor
    • PDI Data Science Notebook (Jupyter) Integration
    • Push Streaming

  • Improve Platform Stability and Usability
    • JSON Enhancements
    • BA Chinese Language Localization for PUC
    • Expanded MDI

  • Additional Improvements




And now a little bit of detail into each of them:


Eco System Integration

HCP Connectivity

HCP is a distributed storage system designed to support large, growing repositories of fixed-content data from simple text files to images, video to multi-gigabyte database images. HCP stores objects that include both data and metadata that describes that data and presents these objects as files in a standard directory structure.


An HCP repository is partitioned into namespaces owned and managed by tenants, providing access to objects through a variety of industry-standard protocols, as well as through various HCP-specific interfaces.



There are many use cases for using HCP in the Enterprise context:



  • Globally Compliant Retention Platform (GCRP)
    • Meet Compliance & Legal Retention requirements (WORM, SEC 17A-4, CFTC and MSRB)

  • Secure Analytics Archive
    • Big data source/target (land) for secure analytic workflows
    • Better Data portability
    • Multi-tenant

  • Protect data with much higher durability (up to fifteen 9s) and availability (up to ten 9s) with HCP




The PDI+HCP combo will allow much more resources into serving these use cases: By leveraging PDI's connectivity capabilities to a wide variety of data, we can use HCP as a "Staging Data Lake" for semi-structured and unstructured data and/or using it as an execution environment for the execution of data science algorithms against this type of content will also, like enriching HCP metadata or doing deep learning for image recognition


In this release we implemented a VFS driver for HCP; Next versions will include a deeper, metadata level integration with HCP's functionality.






MapR DB support





Simple but important improvement: MapR DB is now supported! It's an enterprise-grade, high performance, global NoSQL database management system. It is a multi-model database that converges operations and analytics in real-time, including the HBase API to run HBase applications, even though not all features are compatible.


It's now validated to read/write data from MapR-DB as Hbase. In terms of what use case this enables, I'd call out: Operational Data Hub/Real-Time BI, Customer 360 and several IoT related ones.




Google Cloud Encryption





Google CMEK allows data owners to have a multilayered security model that secures data and controls access to the data encryption keys. With this new capability, Pentaho users can use these custom encryption keys to access data in Google Cloud Storage and Google Big Query enhancing the security of the data. And we're very happy to say that we were able to test that it just works with no product change required! Damn, feels good when it happens (which rarely does!)