Real-time Data Analytics

From Lsdf

Description

This topic plays an important role in real-world machine learning applications like re commender systems, stock market analysis, anomaly detections and Internet of Things sensor data [0]. The goal of the project is to create basic models using Spark [1,2] and Streaming Random Forest (Mondrian Forest) [3,4] and to apply the created algorithms in order to analyze streaming data like meeting calls [5] or financial data [6].

The analysis will obey investigation of a time window width (length of analyzed data) on the accuracy of resulting predictions.

References

[0] Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
[1] https://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression
[2] https://spark.apache.org/docs/latest/streaming-programming-guide.html
[3] http://www.ment.at/blog-old/streaming-random-forest
[4] http://research.cs.queensu.ca/home/cords2/ideas07.pdf
[5] http://meetup.github.io/stream/rsvpTicker/
[6] http://finance.google.com/finance/info?client=ig&q=NASDAQ%3AGOOG

Contact

Bogdan.Lobodzinski@kit.edu