Workflow and ETL Scheduling with XPDL
Just a suggestion, but it occurs to me that it would be a very useful feature to be able to schedule jobs stored in a repository based on workflow design using XPDL.
In this way it would be possible to build and control an ETL system from a purely workflow BPM viewpoint. I can see this as being a major advantage when integrating this into a corporate environment.
The way I envision it working would be to have some sort of servelet running that accepted the scheduling and activated the jobs as required. This server would be able to accept its scheduling / workflow from a workflow designer and can provide feedback to such a system (like enhydra) so that the business tasks can be co-ordinated. Something along those lines would be extremely useful I think.
I suppose the logical extension of this idea would be to embed a workflow architect point of view into the Pentaho suite and allow for the integration of new / custom jobs that meet businesses workflow desires. Lots of potential in that one, particularly for areas that work with both data input from a data mining perspective such as market researchers and analysts, category managers, and so on, and then in turn feed information from each step of their workflow to the next....
Just a few passing thoughts. Modular extension with integrated workflow and scheduling. Sounds nice I think. Could even make a marketspace for the modular extensions for specific workflow tasks in business!
A Real world example...
Just thought of a very simple real world example to illustrate the point I made above:
If we take a simple data integration process for introducing new product data to a 'master' table of products (or whatever structure you want), and we look at this in a corporate / business perspective, then ETL alone is not going to cut it - although it comes close.
The issue arises when there must be approvals and / or user interaction with the process of introducing the data to the system. Say that in order to initiate the first steps of the ETL process the user needs to do this by using an application / webpage / whatever to select the files that are going to be imported - say from a syndicated data supplier. Now say that the files are coming as excel spreadsheets, sometimes zipped and sometimes not.
If I am not mistaken the user cannot 'initiate' this process in its current form in KETTLE, and a decision structure for handling the zip / unzipped incoming files cannot be done in a single job. I may be wrong here but I dont see anywhere natively in KETTLE to be able to design a process such as this. You could certainly bespoke develop this type of thing, but that removes the ability of a business to control its own environment and makes a dependancy on developers. If 'users' could introduce these types of workflow into their BI system, maybe even including steps for approval of data for the next stage of implementation etc... and rules based logic for decision making you would have a killer ETL system indeed.
This is just a simple example, and I am sure that there are some more technical or bespoke ways to deal with this scenario, however - this is a fairly normal thing for a lot of businesses to have to deal with and quite beyond their technical abilities to achieve or their budgets to employ someone to achieve in a bespoke manner.
Anyway, I just thought I could probably make more sense with an example than with my first post on the topic written late one friday afternoon....
Any thoughts on the topic?