Real-time Data Analytics

From Lsdf
Revision as of 15:11, 13 September 2016 by Nico.schlitter (talk | contribs)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Description

This topic plays an important role in real-world machine learning applications like re commender systems, stock market analysis, anomaly detections and Internet of Things sensor data [0]. The goal of the project is to create basic models using Spark [1,2] and Streaming Random Forest (Mondrian Forest) [3,4] and to apply the created algorithms in order to analyze streaming data like meeting calls [5] or financial data [6].

The analysis will obey investigation of a time window width (length of analyzed data) on the accuracy of resulting predictions.

References

[0] Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
[1] https://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression
[2] https://spark.apache.org/docs/latest/streaming-programming-guide.html
[3] http://www.ment.at/blog-old/streaming-random-forest
[4] http://research.cs.queensu.ca/home/cords2/ideas07.pdf
[5] http://meetup.github.io/stream/rsvpTicker/
[6] http://finance.google.com/finance/info?client=ig&q=NASDAQ%3AGOOG

Contact

bogdan.Lobodzinski@kit.edu