Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: FOSS4G 2007 presentation on open source BI

  1. #1
    Join Date
    Apr 2006

    Default FOSS4G 2007 presentation on open source BI

    Dear PDI developers and users,

    For a little while the GeoSOA research team at Laval University (Quebec, Canada) has been working on Kettle and geospatial data. We've made a some modifications to the source code to add a Geometry data type, used with GIS data (e.g. points, lines and polygons representing geographic features).

    The enhancements include input support for Shapefiles and other GIS file formats (for reading Geometry objects and attribute data, not the same as the existing Shapefile input step), input/output for PostGIS (spatial extension for PostgreSQL) databases and topological predicates for the Filter rows step. Spatial analysis functions of the GeOxygene framework are accessible via JavaScript (a GUI step will be added later). Adding support for Oracle Spatial is also planned.

    What can it do for users working with geospatial databases? Among other things, one can easily import data from GIS files to a PostGIS database, perform some spatial analysis functions on this data, and integrate various data sources to a spatial data warehouse, all part of the ETL process.

    We've created a distribution of Pentaho Data Integration (Kettle) including these extensions, which we named GeoKettle. It is currently based on the 2.5.2 branch from the SVN repository. We plan to make this distribution (source and binary) available soon. Eventually it will be ported to the new 3.0 architecture, once sufficiently stable (and when we've got enough time ).

    This project will be presented at the Free and Open Source Software for Geospatial (FOSS4G) 2007 conference (September 24-27, in Victoria, BC, Canada). We'll also demonstrate the use of Pentaho Analysis Services (Mondrian) with geospatial data in the same presentation session.

    For more information you are invited to take a look at the abstract on the FOSS4G 2007 site.
    You can also see GeoKettle in action in this demo video (Flash required).


  2. #2
    Join Date
    May 2006


    Nice, it's completely legal as far as I understand the licenses used in Kettle, if the changed source code is available. But wouldn't it have been less work to make steps (and maybe some plugin connections) instead of actually changing the framework.


  3. #3
    Join Date
    Nov 1999

    Default Very cool

    Sven, if you want to preserve the PGSQL geo-data types, it would probably be nice to create a new data type.
    Hey, let's think about making those data types possible through plugins.

    Bonjour Etienne,

    I'm really excited to see this work being done. A few years ago I did a Kettle project for the Flemish Traffic Center in Antwerpen (Wilrijk).
    It included the challenge to extract a lot of road-topology information from ESRI shape files (hence the plugin that was available. :-))
    The idea obviously was to avoid spending money on another expensive license since ESRI software is closed source.

    The road-layout was then used to predict traffic intensity in advance using road models, statistics and a data warehouse.
    We would feed the data into PTV software (more closed source stuff) and loop the results back to the traffic operators.
    It felt to me at the time that the hard part was getting good road maps :-)

    Anyway, working with the polygons was interesting, but the approach I had at the time should have included building a geospatial data type at the time.
    The problem is that once you have a set of individual points, you need markers to tie those points back to getter to form roads, regions, etc.
    It makes more sense to keep them together. That way you can create operators/plugins in Kettle that calculate intersections, unions, overlays, etc.
    It will be a lot more efficient than doing it one point at a time.

    GeoKettle Folks, congratulations on the work and the results. Let us know how we can help out.
    Good luck with the presentations!


  4. #4


    geo data types was requested by one of our team,
    so it's a good news :-)
    Good luck for the presentation.


  5. #5
    Join Date
    Apr 2006


    Thank you Matt. For my part I had a great time working with Kettle . I'm also glad to see that you have experience working with geospatial data; this will make collaboration between our teams easier.

    I agree that pluggable data types would be a good idea, not only for GIS data types but also for any custom object type that could be handled during the ETL process (e.g. images, audio, ...). The ValueInterface and Value class in 2.5.x are somewhat impractical to extend, since you need to add getters/setters for the new data type to ValueInterface (and all implementing classes) and add constants and many methods to Value. I know that there are many changes about the Value type system in 3.0 but I've only looked at the code briefly. I also had to modify the Database and PostgreSQLDatabaseMeta classes to add conversion to/from the DBMS geometry type (e.g. PostGIS' PGGeometry) and GeoKettle's native Geometry type (based on GeOxygene objects). The same could be done for Oracle Spatial (using sdoapi).

    Considering all the modifications needed, would it be practical to introduce pluggable data types? Adding the libraries needed for geo data types increases the size of the Kettle distribution by quite a bit. I think being able to distribute GeoKettle as an extension to the base Kettle distribution would be better than merging it in the main tree...

    I'll keep you informed on the presentation and will make the slides available if I can. Same with the source code (still needs a little bit of cleaning before release. I'll try to do it before the conference.)

    Best regards,


  6. #6
    Join Date
    Nov 1999


    Hi Etienne,

    In 3.0 we cleaned up a lot of things. Given the fact that Geo-spatial / Image / Music etc are data types that require little conversion logic (only to String maybe) it would seriously simplify things.
    I'm going to look into it next week to see what we can do.
    IF a data type can be plugeable, it would also allow us to keep the needed geo-spatial libs, sounds libs, gfx libs out of the main distribution. We're setting up web-pages for our plugins together with the subversion storage to version it.
    We could do the same for the partitioning plugins and the data type plugins (when we have those).

    Again, keep up the great work. It's exciting to see the power of open-source at work ;-)

    All the best,


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.