Real-time Data Analytics
From Lsdf
Jump to navigationJump to search
Description
This topic plays an important role in real-world machine learning applications like re commender systems, stock market analysis, anomaly detections and Internet of Things sensor data [0]. The goal of the project is to create basic models using Spark [1,2] and Streaming Random Forest (Mondrian Forest) [3,4] and to apply the created algorithms in order to analyze streaming data like meeting calls [5] or financial data [6].
The analysis will obey investigation of a time window width (length of analyzed data) on the accuracy of resulting predictions.
References
- [0] Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
- [1] https://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression
- [2] https://spark.apache.org/docs/latest/streaming-programming-guide.html
- [3] http://www.ment.at/blog-old/streaming-random-forest
- [4] http://research.cs.queensu.ca/home/cords2/ideas07.pdf
- [5] http://meetup.github.io/stream/rsvpTicker/
- [6] http://finance.google.com/finance/info?client=ig&q=NASDAQ%3AGOOG