Let's get geeky - this one is huuuuuge! :)

The first contribution

Nearly 9 (?!?! bloody hell, that much??) years ago I did my first decent post on the Pentaho Community forums (first of about 2 or 3 decent posts overall :p )

It was a tutorial on how to setup Pentaho to work in a multiple project environment; From that thread, I think it was Pentaho 1.2, Hieroglyph Edition... I didn't even knew how to build a solution, but since couldn't get my head around how to setup infrastructure, I started with that. This was mainly an ant script that launched the Pentaho platform build targets, with some extra options on it.

This "Very big pentaho install and setup tutorial" (see how I've always had a way with naming stuff? ;) ) later evolved to what we know today as CBF - the Community Build Framework

This is an insanely useful project, that we used and maintained actively throughout the last decade. It still works, but now we have different paradigms to take into account.

Similar requirements / new approach

These were my premises 9 years ago:

  • I needed to know how to build a demo solution that connects to an arbitrary database and shows simple pentaho abilities in the least possible time (eg: have a demo set in 2 hours in a client's database)
  • I need to switch configuration on the fly so that I can switch from different scenarios (eg: from client A to client B)
  • I don't want to change any original files that could get overwritten in a upgrade
  • The platform upgrade must be easy to do and not break the rest of the setup
  • Debug is a "must have"
  • Must support all kinds of different customization in different projects, from using a different databases (I'm using postgresql , thanks elassad) to different security types.

For those, I chose the CBF structure that compiled and patched the main source, compiled it, started it up...

Now I added a few more requirements

  • It shouldn't compile anymore; Should work form binaries
  • Should work for CE and EE
  • For EE, should process patches
  • Should work for nightly builds
  • Should setup not only pentaho but all the rest of the environment (just... click and go!)
  • Solutions should be VCS-able in git / svn / whatever
  • Anyone should be able to set things up in the blink of an eye
  • Should greatly increase team collaboration by allowing import / exports of work done
  • and why not... even be deployable?

So I built CBF2. It's called CBF2 because it shares a lot of the same premises of CBF... even though it's not for community only and doesn't build anything :p - I told you I suck at naming stuff.

CBF was a fancy name for a ant build file; CBF2 is a fancy name for a really cool set of bash scripts.

Let me be clear of how huge this is: If you were kind'a complaining about the (in)ability to properly manage pentaho project lifecycles, installs, backups and restores, configurations.... you can stop complaining. This is The Solution (tm) for it.

So, here it is: CBF2, where Pentaho, Docker and Git meet for the ultimate solution lifecycle management

CBF2 - Community Build Framework 2.0

It's not community only; You don't actually build anything; But still rocks!


The goal of this project is to quickly spin a working Pentaho server on docker containers. This will also provide script utilities to get the client tools.


  • A system with docker. I'm on a mac, so I have docker-machine
  • A decent shell; either Minux or Mac should work out of the box, Cygwin should as well
  • lftp

For docker, please follow the instructions for your specific operating system. I use a Mac with Homebrew, so I use docker-machine (4Gb mem, 40Gb disk, YMMV)
brew install docker
brew install docker-machine
docker-machine create -d virtualbox --virtualbox-memory 4096 --virtualbox-disk-size 40000 dev

How to use

There are a few utilities here:

  • getBinariesFromBox.sh - Connects to box and builds the main images for the servers (requires access to box. Later I'll do something that doesn't require that)
  • cbf2.sh - What you need to use to build the images
  • getClients.sh - A utility to get the clients tools
  • startClient.sh - A utility to start the client tools

The software directory

This is the main starting point. If you're a pentaho employee you will have access to using the getBinariesFromBox.sh script, but all the rest of the world can still use this by manually putting the files here.
You should put the official software files under the software/v.v.v directory. It's very important that you follow this 3 number representation
This works for both CE and EE. This actually works better for EE, since you can also put the patches there and they will be processed.
For EE, you should use the official -dist.zip artifacts. For CE, use the normal .zip file.

The licenses directory

For EE, just place the *.lic license files on the licenses subdirectory. They will be installed on the images for EE builds.

Released versions:

X.X.X, and inside drop the server, plugins and patches

Nightly Builds

Drop the build artifacts directly in that directory
├── 5.2.1
│ ├── SP201502-5.2.zip
│ ├── biserver-ee-
│ ├── paz-plugin-ee-
│ ├── pdd-plugin-ee-
│ └── pir-plugin-ee-
├── 5.4.0
│ └── biserver-ce-
├── 5.4.1
│ ├── SP201603-5.4.zip
│ └── biserver-ee-
├── 6.0.1
│ ├── SP201601-6.0.zip
│ ├── SP201602-6.0.zip
│ ├── SP201603-6.0.zip
│ ├── biserver-ce-
│ ├── biserver-ee-
│ ├── paz-plugin-ee-
│ ├── pdd-plugin-ee-
│ └── pir-plugin-ee-
├── 6.1-QAT-153
│ ├── biserver-ee-6.1-qat-153-dist.zip
│ ├── biserver-merged-ce-6.1-qat-153.zip
│ ├── paz-plugin-ee-6.1-qat-153-dist.zip
│ ├── pdd-plugin-ee-6.1-qat-153-dist.zip
│ └── pir-plugin-ee-6.1-qat-153-dist.zip
├── 7.0-QAT-76
│ ├── biserver-merged-ee-7.0-QAT-76-dist.zip
│ ├── pdd-plugin-ee-7.0-QAT-76-dist.zip
│ └── pir-plugin-ee-7.0-QAT-76-dist.zip
└── README.txt

CBF2: The main thing

CBF1 was an ant script but CBF2 is a bash script. So yeah, you want cbf2.sh. If you are on windows... well, not sure I actually care, but you should be able to just use cygwin.
Here's what you'll see when you run ./cbf2.sh:
------ CBF2 - Community Build Framework 2 -------
------ Version: 0.9 -------
------ Author: Pedro Alves (pedro.alves@webdetails.pt) -------

Core Images available:

[0] baserver-ee-
[1] baserver-ee-
[2] baserver-merged-ce-6.1-qat-153
[3] baserver-merged-ee-

Core containers available:

[4] (Stopped): baserver-ee-

Project images available:

[5] pdu-project-nasa-samples-baserver-ee-
[6] pdu-project-nasa-samples-baserver-merged-ee-

Project containers available:

[7] (Running): pdu-project-nasa-samples-baserver-ee-
[8] (Stopped): pdu-project-nasa-samples-baserver-merged-ee-

> Select an entry number, [A] to add new image or [C] to create new project:
There are 4 main concepts here:

  • Core images
  • Core containers
  • Project images
  • Project containers

These should be straightforward to understand if you're familiar with docker, but in a nutshell there are two fundamental concepts: images and containers. An image is an inert, immutable file; The container is an instance of an image, and it's a container that will run and allow us to access the Pentaho platform

Accessing the platform

When we run the container, it exposes a few ports, most importantly 8080. So in order to see Pentaho running all we need to do is to access the machine where docker is running. This part may vary depending on the operating system; On a Mac, and using docker-machine, there's a separate VM running the things, so I'm able to access the platform by using the following URL:

Core images

These are the core images - a clean install out of one of the available artifacts that are provided on the software directory. So the first thing we should do is add a core image. The option [A] allows us to select which image to add from an official distribution archive.
When we select this option, we are prompted to choose the version we want to build:
> Select an entry number, [A] to add new image or [C] to create new project: A

Servers found on the software dir:
[0]: biserver-ee-
[1]: biserver-ce-
[2]: biserver-ee-
[3]: biserver-ce-
[4]: biserver-ee-
[5]: biserver-ee-6.1-qat-153-dist.zip
[6]: biserver-merged-ce-6.1-qat-153.zip
[7]: biserver-merged-ee-7.0-QAT-76-dist.zip
CBF2 will correctly know how to handle EE dist files, you'll be presented with the EULA, patches will be automatically processed and licenses will be installed.
Once an image is built, if we select that core image number you'll have the option to launch a new container or delete the image:
> Select an entry number, [A] to add new image or [C] to create new project: 0
You selected the image baserver-ee-
> What do you want to do? (L)aunch a new container or (D)elete the image? [L]:

Core containers

You can launch a container from a core image. This will allow us to explore a completely clean version of the image you selected. This is useful for some tests, but I'd say the big value would come out of the project images. Here are the options available over containers:
> Select an entry number, [A] to add new image or [C] to create new project: 3

You selected the container baserver-merged-ce-6.1-qat-153-debug
The container is running; Possible operations:

S: Stop it
R: Restart it
A: Attach to it
L: See the Logs

What do you want to do? [A]:
Briefly, here are the options mean - even though they should be relatively straightforward:

  • Stop it: Stops the container. When the container is stopped you'll be able to delete the container or start it again
  • Restart it: Guess what? It restarts it. Surprising, hein? :)
  • Attach to it: Attaches to the docker container. You'll then have a bash shell and you'll be able to play with the server
  • See the Logs: Gets the logs from the server


Definition and structure

A project is built on top of a core image. Instead of being a clean install it's meant to replicate a real project's environment. As a best practice, it should also have a well defined structure that can be stored on a VCS repository.
Projects should be cloned / checked out in to the projects directory. I recommend every project to be versioned in a different git or svn repository. Here's the structure that I have:
pedro@orion:~/tex/pentaho/cbf2 (master *) $ tree -l ./projects/
└── project-nasa-samples -> ../../project-nasa-samples/
├── _dockerfiles
└── solution
└── public
├── Mars_Photo_Project
│ ├── Mars_Photo_Project.cda
│ ├── Mars_Photo_Project.cdfde
│ ├── Mars_Photo_Project.wcdf
│ ├── css
│ │ └── styles.css
│ ├── img
│ │ └── nasaicon.png
│ └── js
│ └── functions.js
└── ktr
├── NASA\ API\ KEY.txt
├── curiosity.ktr
├── getPages.ktr
└── mars.ktr
All the solution files are going to be automatically imported, including metadata for datasources creation.
The directory _dockerfiles is a special one; You can override the default Dockerfile that's used to build a project image (the file in dockerfiles/buildProject/Dockerfile) and just drop a project specific Dockerfile in that directory using the former one as an example. Note that you should not change the FROM line, as it will be dynamically replaced. This is what you want for project level configurations, like installing / restoring a specific database, an apache server on front or any fine tuned configurations.

Project images

The first thing that we need to do is to create a project. To do that is very simple: we select one of the projects on our projects directory and a core image to install it against. This separations aims at really simplifying upgrades / tests / etc
> Select an entry number, [A] to add new image or [C] to create new project: C

Choose a project to build an image for:

[0] project-nasa-samples

> Choose project: 0

Select the image to use for the project

[0] baserver-ee-
[1] baserver-merged-ce-6.1-qat-153
[2] baserver-merged-ee-

> Choose image: 2
Once we have the project image created, we have access to the same options we had for the core images, which is basically launching a container or deleting the image.

Project containers

Like the images, project containers work very similarly to core containers. But we'll also have two extra options available:

  • Export the solution: Exports the solution to our project folder
  • Import the solution: Imports the solution from our project folder into the running containers. This would be equivalent to rebuilding the image

Note that by design CBF2 only exports the folders in public that are already part of the project. You'll need to manually create the directory if you add a top level one.

The client tools

This also provides two utilities to handle the client tools; One of them, the getClients.sh, is probably something you can't use since it's for internal pentaho people only.
The other one, startClients.sh, may be more useful; It requires the client tools to be downloaded into a dir called clients/ with a certain structure:
pedro@orion:~/tex/pentaho/cbf2 (master *) $ tree -L 4 clients/
├── pad-ce
│ └──
├── pdi-ce
│ ├── 6.1-QAT
│ │ └── 156
│ │ └── data-integration
│ ├──
│ │ └── 192
│ │ └── data-integration
│ └── 7.0-QAT
│ └── 57
│ └── data-integration
├── pdi-ee-client
│ └──
│ └── 192
│ ├── data-integration
│ ├── jdbc-distribution
│ └── license-installer
├── pme-ce
│ └──
│ └── 182
│ └── metadata-editor
├── prd-ce
│ └──
│ └── 182
│ └── report-designer
└── psw-ce
If you use this, then the startClients.sh simplifies launching them; Note that, unlike the platform, this will run on the local machine, not on a docker VM:
edro@orion:~/tex/pentaho/cbf2 (master *) $ ./startClients.sh
Clients found:

[0] pdi-ce: 6.1-QAT-156
[1] pdi-ce:
[2] pdi-ce: 7.0-QAT-57
[3] pdi-ee-client:
[4] pme-ce:
[5] prd-ce:

Select a client:

Taking it further

This is, first and foremost, a developer's tool and methodology. I'll make no considerations or recommendations in regards to using these containers in a production environment or not because I have simply no idea how that works as we're mostly agnostic on those methods.
Pentaho's stance is clearly explained here:
As deployments increase in complexity and our clients rapidly add new software
components and expand software footprints, we have seen a definitive shift
away from traditional installation methods to more automated/scriptable
deployment approaches. At Pentaho, our goal is to ensure our clients continue
to enjoy flexibility to adapt our technology to their environments and
individual standards.

Throughout 2015, Pentaho worked with customers who use various deployment
technologies in development, test, and production environments. We have seen
that the range of technologies used for scripted software deployment can vary
as widely as the internal IT standards of our clients. In short, we have not
found critical mass in any single deployment pattern.

To support our clients in their adoption of these technologies, Pentaho takes
the perspective that our clients should continue to be autonomous in their
selection and implementation of automated deployment and configuration

Pentaho will provide documented best practices, based on our experience and
knowledge of our product, to assist our clients in understanding the
scriptable and configurable options within our product, along with our
deployment best practices. Due to the diversity of technology options, Pentaho
customer support will remain focused on the behavior of the Pentaho software
and will provide expertise on the Pentaho products to help customers
troubleshoot individual scripts or containers.

Have fun. Tips and suggestions to pedro.alves at webdetails.pt