Prediction in Big Time Series Data


Mohammad Shokoohi-Yekta, Data Scientist, Apple

Time Series data is the new big data and it’s almost everywhere. Time Series is an ordered set of real values, quantities that represent or trace the values taken by a variable over a period of time. Some examples are stock prices, sensor data collected from smart watches/phones, weather data, heartbeats, etc.

Prediction and forecasting have been a topic of great interest in the data mining community for the last decade. Most of the work in the literature has dealt with discrete objects, such as keystrokes (i.e. predictive text), database queries [1], medical interventions [2], web clicks, etc. However, prediction may also have great utility in real-valued time series. For concreteness we briefly consider two examples:

  • Researchers in robotic interaction have long noted the importance of short-term prediction of human initiated forces to allow a robot to plan its interaction with a human. For example a recent paper notes the critical “importance of the prediction of motion velocity and the anticipation of future perceived forces [to allow the] robot to anticipate the partner’s intentions and adapt its motion” [3].
  • Doppler radar technology introduced in the last two decades has increased the mean lead time for tornado warnings from 5.3 to 9.5 minutes, saving countless lives [4]. But progress seems to have stalled, with 26% of tornados within the US occurring with no warning. McGovern et al. argue that further improvements will come not from new sensors, but from yet-to-be-invented algorithms that “examine existing (time series) data for predictive rules” [5].

While forecasting is mature enough to have its own conferences and commercial software (SAS, IBM Cognos, etc.), the handful of research efforts to consider time series rule-based prediction have met with limited success. In [6] we show novel algorithms that allow us to quickly discover high quality rules in very large time series datasets that accurately predict the occurrence of future events.

[1] Li, G., Ji, S., Li, C., and Feng, J. Efficient type-ahead search on relational data: a TASTIER approach. SIGMOD Conference 2009: 695-706.

[2] Weiss, S., Indurkhya, N., and Apte, C., Predictive Rule Discovery from Electronic Health Records. ACM IHI, 2010.

[3]   Gribovskaya, E., Kheddar, A., and Billard, A. Motion Learning and Adaptive Impedance for Robot Control during Physical Interaction with Humans. ICRA 2011.

[4]   Brotzge, J. and Erickson, S., Tornadoes without NWS warning. Weather Forecasting, 25, 159-172. 2010.

[5] McGovern, et al. Identifying Predictive Multi-Dimensional Time Series Motifs: An application to severe weather prediction. Data Mining and Knowledge Discovery. 2010.

[6] Shokoohi-Yekta, M., Chen, Y., Campana, B., Hu, B., Zakaria, J., & Keogh, E. (2015, August). Discovery of Meaningful Rules in Time Series. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 1085-1094.