PCA with Missing Values

This demo gives a simple example of the Javascript port of the toolbox for PCA with missing data by A. Ilin and T. Raiko. Most of the text in this page is a word by word copy of the text in their matlab demo.

I tried to follow the matlab code as closely as possible, so if you are familiar with the functions in their toolbox you should be able to easily understand the javascript code.

The linear algebra is handeled by the Numeric Javascript library, while the plotting is handled by the Plotly library. This port wouldn't be possible (or would be of significantly less quality) without this two great projects.

Generate Synthetic Data

First, we generate synthetic data according to the model $$\bold{x} = \bold{A} \bold{s} + \bold{\eta}$$ where , , , and is a noise vector. Matrix \(\bold{A}\) is an orthonormal matrix sampled from a Gaussian distribution, \(\bold{s}\) is obtained from a simple dynamical process, and Gaussian noise is added.

We generate samples for \(\bold{s}\) with , obtaining , and . Finally we remove of the observations from the data matrix \(\bold{X}\).

\(\bold{S}\)

\(\bold{X}\)

Train the PCAMV

Now we train a variational Bayesian PCA model with two components with 30 iterations. The following plots show the evolution of the RMS error and the cost function.

RMS Error

Cost Function

The estimation matrix \(\bold{\hat{A}}\) is obtained up to a linear transformation of \(\bold{A}\). In the following plots, light blue is the linear transformation of \(\bold{A}\), while yellow is the estimated value.

\(\bold{A}\) vs. \(\bold{\hat{A}}\)

\(\bold{A}\) vs. \(\bold{\hat{A}}\)

The first plot displays the found principal components (yellow) and confidence regions (yellow shade) as well as the original components (light blue).

The second plot displays the principal subspace in which the found components are marked with dots and the original components are shown with crosses. Get the estimated confidence regions by moving the mouse over the dots.

\(\bold{s}\) vs. \(\bold{\hat{s}}\)

\(\bold{s}\) vs. \(\bold{\hat{s}}\)

Reconstruction of Missing Data

Finally we can plot the reconstructions of the missing data (yellow) and the corresponding confidence intervals (yellow shade). The original noiseless data are shown in light blue.

\(\bold{\hat{x}}\)

Getting Started

The algorithm for PCA-MV can take a significant ammount of time to compute. To prevent freezing the UI while computing the PCA-MV, a WebWorker operation mode is provided.

Include the PCA-MV

To run the algorithm you will need the library numeric.js and paper.js. The library numeric.js is needed to deal with linear algebra, while paper.js is used to overload the javascript operators and simplify the code. You can grab both libraries from cdnjs.com

<script src="https://cdnjs.cloudflare.com/ajax/libs/numeric/1.2.6/numeric.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/paper.js/0.10.2/paper-full.min.js"></script>
First you need to download pcamv.js and include it in your website (see the download button on top). Alternatively, you can try it from my website, including
<script src="//www.jprendes.com/pcamv/pcamv.js"></script>
Please consider it is not a CDN, and I may introduce changes in the code :-).

Usage

This interface exposes the object constructor PCA. The object constructor receives a list of options, including all the options in the original implementation. An example of istantiation of a the PCA object with the default options is listed bellow

var pca = new PCA({
	init: 'random',
	maxiters: 1000,
	bias: 1,
	uniquesv: 0,
	autosave: 600,
	filename: 'pca_f_autosave',
	minangle: 1e-8,
	algorithm: 'vb',
	niter_broadprior: 100,
	earlystop: 0,
	rmsstop: [100,1e-4,1e-3], // [] means no rms stop criteria
	cfstop: [], // [] means no cost stop criteria
	verbose: 1,
	xprobe: [],
	rotate2pca: 1,
	display: 0,
	itercallback: [],
});

The newly created pca objects expose only one function, pca.full, which computes the PCA-MV algorithm considering full covariance matrices. The function pca.full takes two arguments, the data matrix X (where each column is an observation), and the number of components ncomp

var X = [[...],[...]] // matrix of size KxN
var ncomp = 2; // number of components
var result = pca.full(X,ncomp);

Alternatively, the function PCA.full can be used to achieve the same result

var result = PCA.full(X,ncomp,options)

The processing of a big number of data may take a long time, and block the UI. In these cases (or most cases, if compatibility with old browsers is not an isue), an asyncronous WebWorker interface can be used.

var promise = PCA.worker(X,ncomp,options)
This interface returns a promise, and maintain the compatibility with all the callbacks available in the basic version (itercallback, display.rms, display.prms, display.cost and partialsol).

Example

The following example computes the PCA-MV on a dataset consisting of 8 vector, each of dimension 4.

var X = [[0,1,2,3,4,5,6,7],
         [0,2,2,4,4,6,6,8],
         [0,2,0,2,0,2,0,2],
         [2,0,2,0,2,0,2,0]];
var ncomp = 2;
var result = PCA.full(X, ncomp, {minangle: 1e-100, maxiters: 100});

var reconstruction;
with (numeric) with (result)
	reconstruction = add(dot(A, S), dot(Mu, rep([1,8],1)));
> 

Options Description

init

Structure containing the initialization fo the algorithm. Alternativelly, the string 'random', indication that a random initialization should be generated.

Default value: 'random'

maxiters

Integer indicating the maximum number of iterations.

Default value: 1000

bias

Boolean value indicating whether the data has a non-zero mean. When set to true, the algorithm will estimate the mean of the data, otherwise, it will be considered that the data has zero-mean.

Default value: true

uniquesv

Boolean value indicating whether each sample should have an individual variance for its \(\bold{s}\), or whether samples with the same missing values should share the same variance.

Default value: false