Approximate a matrix without having all of its rows

14 hours ago
Matrix approximation is a heavily studied sub-field in data mining and machine learning. A big set of information evaluation tasks depend on obtaining a low rank approximation of matrices. Examples are dimensionality reduction, anomaly detection, data de-noising, clustering, and suggestion systems. In this text, we glance into the issue of matrix approximation, and find out how to compute it when the entire data just isn’t available at hand!
The content of this text is partly taken from my lecture at Stanford -CS246 course. I hope you discover it useful. Please find the total content here.
Most data generated on web will be represented as a matrix, where each row of the matrix is an information point. For instance, in routers every packet sent across the network is an information point that will be represented as a row in a matrix of all data points. In retail, every purchase made is a row within the matrix of all transactions.
At the identical time, just about all data generated over web is of a streaming nature; meaning the information is generated by an external source at a rapid rate which we’ve no control over. Consider all searches users make on Google search engine in a every second. We call this data the streaming data; because similar to a stream it pours in.
Some examples of typical streaming web-scale data are as following:
Consider streaming data as a matrix A containing n rows in d-dimensional space, where typically n >> d. Often n is so as of billions and increasing.
In streaming model, data arrives at high speed, one row at a time, and algorithms must process the items fast, or they’re lost endlessly.