View Full Version : Data Profiling/Mapping?

12-31-2006, 01:13 AM
I've not had the pleasure of spending nearly as much time with the different Pentaho projects as I would like but I'm constantly checking in to see what new is going on. I think the project has an excellent premise and is really turning into a great suite of projects. You've got the ETL, Business Intelligence, Reporting, a good set of people and a growing community. I really like what I see.

One question that I do have is there a component for Data Profiling and/or Data Mapping. Profiling being the method of analyzing data in an existing system to find possible data quality problems, etc. Mapping being the process of locating relationships between a source and target system in order to speed up the process of the actual ETL work.

If Pentaho does not yet include such features are there any Opensource projects out there that attempt to do these things? Are there any thoughts of adding these features?

Great product guys. Keep up the great work.

01-03-2007, 12:12 PM
Do any Pentaho representatives have comments on my post? Any users have similar hopes for Pentaho?

01-11-2007, 10:54 PM
I'm using Kettle to build a table driven data profiler but my needs are pretty basic - record counts and by field, distinct values, min, max, nulls.

The output is pretty basic but hopefully, analyzed over time, it'll provide some insight. I know of no opensource data profiling tools. To discover relationships on pretty small data sets, try R - it will not work for me as I'm woking with 10s and 100s of millions of rows but it is a cool little tool.

Good luck - please post if you find an opensource data profiler!

01-11-2007, 11:51 PM
I don't know that it's any good yet but I found this after my post. I haven't had a chance to play with it.


There seems to be some overlap with some of the things that Pentaho offers but it seems to have the profiling component at least.