Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Using PDI as environment for modeling image processing transformations?

  1. #1

    Default Using PDI as environment for modeling image processing transformations?

    Hello,

    probably a weird idea, but has anybody tried PDI to use/extend in a way to process images? For instance reading an image file from disk, apply e.g. a black/white transformation to the image and save the file on disk again. A very simple example.

    Any comments?

    Thomas

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I guess it wouldn't be hard to do it. You could load the image into a "Binary" type field and then create steps/plugins to manipulate the images.
    For version 5.0 we want to have pluggable data types that would allow you to give the field an "Image" data type allowing you to add all sorts of metadata like image type, size, color-space information and what not.
    In a way I think it would be cool :-)

    Matt

  3. #3

    Default

    Hi Matt!

    In a research project we currently look for new ways to somehow model a work/execution flow for image processing in a visual way. Having a strong DWH background using Kettle, I thought Kettle might be a good environment for both, modeling and execution with it's execution environment, but I was ensure, if it is a totally dumb idea, due to technical restrictions in Kettle etc. Implemented a custom job entry in the past (e.g.: http://blog.upscene.com/thomas/index...11&category=16), I thought wrapping various Java-based image processing algorithms into PDI steps, etc. might be an idea worth to be considered.

    Thomas

  4. #4
    Join Date
    Sep 2007
    Posts
    834

    Default

    It sounds like a very interesting project... and feasible.

    Some years ago I developed some algorithms that processed images, not in Kettle! but in C. The main focus was to distribute the process among several nodes in a network. For example, there was a master that executed the "non-paralelizable" tasks. The pixels in the image were distributed among some "slaves". The slaves did the main task. Finally, the master collected all processed data and generated the output file.
    I can see the analogy with Kettle clusters.

    I would give this a try.

    Matt: those news about version 5 sound great!

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    5.0 is still far away but we started compiling our API wish list:

    http://wiki.pentaho.com/display/EAI/...nges+wish-list

  6. #6

    Default

    I have done image manipulation in the past for both fax image conversion, image splitting/combining, and basically anything related to Document/data capture or Forms Processing.

    As long as you don't mind performance in the 1-2 image files per second range (or, for larger documents, 1-3 sec per image file), you should be able to pull this off with JAI/image-io.

    I do not want to discourage this capability, but when it comes to image manipulation you may want to have PDI instead talk with an image server or a local native executable (imageMagick) for the actual manipulation. Unfortunately, talking with an image server means either sending the full binary again and dealing with the received binaries, or having a reference (HTTP) sent to the image server for the actual processing and dealing with the return type(s).

    Having provided this warning specifically around image manipulation, the API for PDI is very intriguing to allow other file conversions/manipulations as well such as TXT/WORD to PDF for example when you start talking about working with Files. Having a java-version of the conversion/manip for easy setup/basic example sounds great, then as you need more horsepower look at the more specific/native options.

  7. #7
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I'm not sure this all has to be true. One thing PDI can do very well is work in parallel or in short keep all processors busy on one or more machines.
    Therefor I'm sure the trick is to create different steps to do different low level image manipulations and to also make sure that image data doesn't need to be copied unless really needed.
    If the Java written algorithm to do a certain image manipulation is 10x slower than the native C version there's probably something seriously wrong with either. If the difference is 0-50% you need to make the solution scale out. I'm not by definition as sure as you are that Java is slower in any meaningful way though.

  8. #8

    Default

    Java itself isn't that slow, we are all here using java :-)

    Image processing in java has traditionally been problematic. However...breaking up the work into parallel (i.e. small, focused, potentially different tools/libs) steps would definitely change the game to where this could really work out.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.