PDA

View Full Version : Linked tables in Weka



Dina
11-01-2007, 11:46 PM
Hello,
I am a newbie to Weka and so far I can't find an easy solution to my problem: I have a database for movie renting stores that consists of 2 tables that I'd like to get analyzed:
Table Stores(
Store_id,
StoreType _id)

Table StoresMovies(
Store_id,
Movie_Id)

I have about 10-15 different StoreTypes (i.e. Family Friendly, Adults Oriented etc). The database contains about 1000 Stores and their Movie Database - for each store there are about 100 Movies i.e. 100 000 rows in the StoresMovies tables with 10 000 unique Movies.
Using this training database I'd like to predict StoreType of the newly added Stores based on the Movies they have. Some movies can be rented out by different StoreTypes while some may point to specific StoreType (this should be identified through training data).

I wasn't able to find how to feed my data to Weka as it seems to process only 1 table. If I create a table that will have columns for each movie with Yes/No Values for each Store, then I understand how to deal with this data, but it's not easy to get this dataset out of my data.

I would really appreciate any help!

milkdud
11-03-2007, 11:29 AM
hi dina,

What you require is a form of data preprocessing that is built in to any SQL compliant relational database management system (ie: MsAccess, mysql, SqlServer, Oracle, etc. etc. etc. ...) accessible through any university computing environment (try their library).

To get the job done, join these two tables on Store ID to get a single joined table comprised of fields (StoreID,StoreType,MovieId, etc.). The System R pseudocode would look something like:

JOIN Stores to StoresMovies on StoreId giving StoreTypeMovies table.

I think there is also a way to do this with a spreadsheet like Excel or its open source counterpart.

Almost all these systems let you do this through their GUI, so you don't need to do so much typing and syntax, and you can export the resulting table in .csv format.

Please note that Weka is intended as a **data mining** system, and not yet another database-management-system-that-has-been-developed-over-20-years-ago.

-- john