Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Time series analysis with WEKA

  1. #1

    Default Time series analysis with WEKA

    Just started to use WEKA for Time series analysis. so very beginner, have a couple of quick questions and hope to get some answers here:
    1. is it possible to have more than just time stamp as input variables, for example
    I have following data
    period customer# sales_amt sales_qty
    201301 cust 001 xxx,xxx.xx yyyy
    201301 cust 002 kkk,kkk.kk zzzz

    so far, from GUI, it seems like for input data set I CANNOT have 2 or more same periods even though the customer # will differentiate them, I have to just do it one customer at a time - if I have a handful of customers, manually do it, not bad, but what if I have more than 100 of customers?
    also, I will have more than just one variable like customer#, say I will add region or other variables, how can I handle that in WEKA?

    2. there are several classifier functions available for Time series analysis and many parameters to adjust, in general, how do you determine which model is best? in my case, I set aside some test data, then I compare the predicted value to my known test data, whichever produces closest to test data is the one I pick as best - but you never know this one picked will make better prediction for future unknown data.

    3. how do you automate the time series analysis and forecast when it is ready for production?
    the model I build in GUI is based on the training data set available to me at the time, when the new data comes in, how do you dynamically add them and then make new prediction?
    say my model was built on data up until 12/2013, on 2/1/14, I have 1/2014 data available to me, I would think I should add this new monthly data to my training data, and then use it to build new model based on this newly added data(together with all previous training data), then make prediction for future periods, is it supposed to work like this? if do, how should I go about doing it to automate the process? Please shed some light on this.

    4. One more question - how do you separate out trend v.s seasonal v.s random component in WEKA? I tried R and it can easily do it by calling decompose() function. Anyway I can do it in WEKA?

    Thanks a lot, your answer is highly appreciated!

    Last edited by mbyanfei; 02-05-2014 at 07:36 PM.

  2. #2
    Join Date
    Aug 2006


    Hi Yan,

    For question 1, you will have to separate each customer, region etc. into a separate dataset I'm afraid. However, since these will all have the same data format you could train one model to use as a base and then automate the creation of all the others using the WekaForecasting plugin step for PDI. This step has an option to rebuild the forecasting model using incoming data and write the updated model out to another file.

    I think this answers question 3 too. Unfortunately, this PDI step is an enterprise edition feature, so you will need a Pentaho subscription. However, if you are willing to work entirely with Weka then there is an equivalent component for Weka's Knowledge Flow environment (you can find it under the "Time Series" folder in the Knowledge Flow design palette).

    As for question 2 - like all machine learning based modelling you have to assume that future data follows the same underlying distribution as your training data, otherwise generalisation is not possible. So your approach of using a hold-out set for model selection is the correct one.

    Weka's approach of using machine learning regression schemes to model time series does not allow for decomposition into trend, seasonal and random parts.


Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.