Hitachi Vantara Pentaho Community Forums
Results 1 to 11 of 11

Thread: Time Series Forecasting - Data collection intervals

  1. #1
    Join Date
    Jan 2015
    Posts
    15

    Default Time Series Forecasting - Data collection intervals

    Hi,

    I'm currently trying to produce a forecast using Weka for a project I'm working on - however, the data I have collected is not regularly spaced and varies from gaps of 10 minutes to a few hours. I have attempted to average my data so there is only one value per hour but I get the error: priming instances do not appear to be in ascending order of the time stamp field (datetime)! 0.776667, 2015-01-27T00:00:00 : 0.774, 2015-01-27T00:00:00.

    I'm pretty new to the software so excuse me if I'm making a really easy mistake. I'm currently using the GUI before implementing it in code, and I'm getting my data straight from my MySQL database. At the moment I'm just using the Time Series forecaster straight after adding my data in Explorer - so I'm guessing I may need to do some analysis of my data before performing the forecast?

    Here's a sample of some of the data - (there's over 400 rows in total)

    0.7377777777 2015-01-27 17:09:29
    0.724 2015-01-27 12:51:04
    0.7366666667 2015-01-27 11:13:15
    0.74 2015-01-27 09:14:44
    0.7419 2015-01-27 07:14:44
    0.72 2015-01-27 06:14:44
    0.701 2015-01-27 05:10:12


    Any help would be greatly appreciated!!! Thanks

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    Hi,

    The sample you show in your post appears to be in descending order of date. The first instance in the dataset should be the oldest historical case and the last instance the newest one.

    How are you averaging the data? The error output seems to have lost the timestamp portion of the date time.

    Cheers,
    Mark.

  3. #3
    Join Date
    Jan 2015
    Posts
    15

    Default

    Hi,
    Thanks for your reply.
    Aplogies I somehow managed to make the example of my data in the wrong order - when I do my sql query in the openDB window I order it by the date column.

    I have now managed to create a dataset using matlab with regular intervals of 15 minutes, which I then export as a csv and use excel to format the date as yyyy-mm-dd HH:MM:ss, which I then import into my MySQL database with no issues raised with the time column as a timestamp type.

    eg.

    2014-12-08 01:15:00 0.6 1.3833 5
    2014-12-08 01:30:00 0.59508 1.5 5
    2014-12-08 01:45:00 0.59016 1.6167 5
    2014-12-08 02:00:00 0.58525 1.7333 5
    2014-12-08 02:15:00 0.58033 1.85 5

    However I'm still getting the same issue, with slightly different wording:

    priming instances do not appear to be in ascending order of the time stamp field (datetime)! 2015-01-24T00:00:00, 0.67667, 0.0357, 5.7505 : 2015-01-24T00:00:00,0.65, 0.032657, 5.857

    It's also worth noting that the dataset ranges from the 8th of December 2014 at 01:00 to the 24th of January 2015 at 13:15

  4. #4
    Join Date
    Jan 2015
    Posts
    15

    Default

    Hi,

    Apologies I managed to order my data wrong when copying it over onto here! It is in fact in ascending order of date, so the oldest record first.

    I have since managed to get my data into 15 minute intervals from the 8th of December 2014 at 1.00 am to the 24th of January at 13:15pm. to do this I used matlab, and then exported the datasets into excel where I combined them all into one table, and then formatted the date as yyyy-mm-dd HH:MM:SS. I then saved this as a csv, and imported it into a table in my database, with the date field being of type timestamp, with no errors.

    However, I'm still getting the same problem when I try and use time series forecasting on it. It seems to recognise the column as type Date but that's as far as it gets

  5. #5
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    OK. Can you make your dataset available somewhere so I can take a look at it?

    Cheers,
    Mark.

  6. #6
    Join Date
    Jan 2015
    Posts
    15

    Default

    Hi,

    I've uploaded the dataset csv that I imported into my database here : https://www.dropbox.com/s/7ilpezcj8g...dData.csv?dl=0

    I basically want to forecast the level with rain and temp values as overlay data eventually (at the moment I'm just trying to get it to work with the data!). I intend on having values for future dates with rain and temp but no value for level.

    Thanks

  7. #7
    Join Date
    Jan 2015
    Posts
    15

    Default

    Hey wondered if you've managed to have a look at the dataset, still having problems so keen to know whats going wrong haha!

  8. #8
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    If I convert your csv file to ARFF using:

    java weka.core.converters.CSVLoader interpolatedData.csv -D first -format "yyyy-MM-dd HH:mm:ss" -B 5000

    Then I get no errors regarding the order of the date field when running WekaForecaster on it. Are you reading the data directly from a database? If so, perhaps it is not coming back in ascending order when read?

    Cheers,
    Mark.

  9. #9
    Join Date
    Jan 2015
    Posts
    15

    Default

    I've got it working! I needed to put the right format for the datetime column in the invoke extra options dialogue.

    A separate new issue I'm having now is that I want to use overlay data, so I have a dataset identical to the one linked, but with the last 100 values of level missing. However I get an error when I try and use rain and temp as overlay data as 'Unable to generate future forecast because there is no future overlay data available.' Do I need to set a point to start the forecasting for level?

    Thanks for your help!

  10. #10
    Join Date
    Feb 2015
    Posts
    8

    Default

    Dear Mark,

    I am facing issue of Time series forecasting by every 15 minutes. Is it possible to forecast data by every 15min as lowest periodicity is hourly in WEKA3.7. I have data of stock market equity data for every 15min from 2014-10-12 00:00:00 till 2015-01-31 23:45:00. Is any one try to forecast by 5min or 15min?

  11. #11
    Join Date
    Sep 2016
    Posts
    1

    Default

    Hi,

    I am new to Pentaho and Weka.

    I am trying to do forecasting using Weka Knowledge Flow. I am using Weka 3.9.1 version.
    Weka Timeseries forecasting component works perfectly fine without selection of rebuild/re estimate on incoming data.
    Once I select it for autorefresh purpose, it starts giving me same error Caused by: java.lang.Exception: All values of the time stamp field (Date) were missing in the priming data!I even tried with standard sample datasets- airlines.arff and wine.arff
    Both the times same error coming.

    Dear Mark,

    If you can please guide what is the issue.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.