Just starting a thread regarding the topic.

Upfront: Most search engines keep their own set of data optimized for
searching. This is additional data on top of Database, database
indexes, OLAP, etc. Problem is keeping the search engine data
up-to-date while also keeping the database and datawarehouse data
up-to-date. Search engine is assumed to support full-text searching,
but may also support different analysis of fields (such as Levenstein
distance, synonyms, etc).

Use case to demonstrate topic (may be poor example, but just to show the
scenario exists):
*1,000,000 record CSV file of firstname, lastname, list of interests,
and a description field about themselves.
*Regular batch changes to the data on a daily/weekly/monthly basis
(could be 2,000,000 changes in one week, with 500,000 being new
entries).

End solution would like to:
*search by first name and last name.
*categorize interests/search for similar interests (music: mp3, classic,
rock, ipod, beatles, etc).
*full-text search description fields.

As you can see by the last pieces of the solution, SQL/OLAP may not be
the best approach for some of those scenarios while search engine
technology (like Lucene) would. However, when you work with good-sized
datasets, the ETL tools work great for databases, but not for keeping
the search engine index data up-to-date.

I started (slowly) working on a Kettle/PDI step to update a Solr server
(a lucene search server) with the elements as they pass through the ETL
process.

Sidenote: I was also looking at EJB3/Hibernate/Hibernate Search
scenarios and that may prove more challenging as directly updating the
database does not necessarily pass through Hibernate Annotations for
updating the hibernate-lucene index.

So - is there any interest for this topic, or is keeping search engine
data updated usually handled differently?

A byte for your thoughts,
-D



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To post to this group, send email to kettle-developers (AT) googlegroups (DOT) com
To unsubscribe from this group, send email to kettle-developers-unsubscribe (AT) g...oups (DOT) com
For more options, visit this group at http://groups.google.com/group/kettle-developers?hl=en
-~----------~----~----~----~------~----~------~--~---