Home Artificial Intelligence Matrix Approximation in Data Streams Data as a matrix Data streaming model

Matrix Approximation in Data Streams Data as a matrix Data streaming model

0
Matrix Approximation in Data Streams
Data as a matrix
Data streaming model

Approximate a matrix without having all of its rows

Towards Data Science
Image credit: unsplash.com

Matrix approximation is a heavily studied sub-field in data mining and machine learning. A big set of information evaluation tasks depend on obtaining a low rank approximation of matrices. Examples are dimensionality reduction, anomaly detection, data de-noising, clustering, and suggestion systems. In this text, we glance into the issue of matrix approximation, and find out how to compute it when the entire data just isn’t available at hand!

The content of this text is partly taken from my lecture at Stanford -CS246 course. I hope you discover it useful. Please find the total content here.

Most data generated on web will be represented as a matrix, where each row of the matrix is an information point. For instance, in routers every packet sent across the network is an information point that will be represented as a row in a matrix of all data points. In retail, every purchase made is a row within the matrix of all transactions.

Figure 1: Data as a matrix — Image by the writer

At the identical time, just about all data generated over web is of a streaming nature; meaning the information is generated by an external source at a rapid rate which we’ve no control over. Consider all searches users make on Google search engine in a every second. We call this data the streaming data; because similar to a stream it pours in.

Some examples of typical streaming web-scale data are as following:

Figure 2: Size of typical streaming web-scale data — Image by the writer

Consider streaming data as a matrix A containing n rows in d-dimensional space, where typically n >> d. Often n is so as of billions and increasing.

In streaming model, data arrives at high speed, one row at a time, and algorithms must process the items fast, or they’re lost endlessly.

LEAVE A REPLY

Please enter your comment!
Please enter your name here