Hi All

I'm new to Pentaho and currently evaluating Pentaho Data Integration tool/package to see if I can adopt it to my process. Any advice on what I could do to achieve this with Pentaho will be highly appreciated.


1. We are using Oracle DB and will further extend our solution to grab data from MSSQL DB, MySQL DB, PDFs, Images, Excel and others.
2. I'm wondering if Pentaho can help with mapping from these various to one house(Data Lake or Farm) with oracle. For instance from an Oracle DB, with Table A, Table B, Table C, Table D, MSSQL_DB_Table E, Excel_File_Table F and etc, I can Map from these different sources to the Oracle data lake.
3. In the data lake the different sources will be standalone as maybe separate objects. For Example, inside the lake, Table A is separate from Table B and etc, all in one house (Data Lake)
4. Does
Pentaho support auto sync of new data from various sources to my single data source (data lake) for real time analysis? Do i have to use a scheduler to run jobs (this is not so convenient as the data needs to be up to date and on time for real time analysis. If you think otherwise, please advise)

5. My aim is to point to one source, one house for analysis.
6. There is also the security issue where I need to specify who can access which data/table from the data lake. We have an active user directory. Does
Pentaho support integration active user directories?
7. How does
Pentaho support extraction of data from PDFs?

Any idea or pointers of how to do this mapping, extraction and integration of user directories with Pentaho or if it supports any these scenario will be highly appreciated.

Thanks and looking forward to your response.