A clever solution to this problem for streaming mean and variance computation was proposed by West in 1979.In his algorithm the summed quantities are controlled to be on average of comparable size.Thus this algorithm should not be used in practice.This is particularly bad if the standard deviation is small relative to the mean.Using: $$\text(X):=\frac\sum_i \omega_i x_i$$ The "naive", non-corrected variance I'm using is this: $$\text(X):=\frac\sum_i\omega_i(x_i - \text(X))^2$$ So I'm wondering whether the correct way of correcting bias is A) $$\text(X):=\frac\sum_i\omega_i(x_i - \text(X))^2$$ or B) $$\text(X):=\frac\frac\sum_i\omega_i(x_i - \text(X))^2$$ or C) $$\text(X):=\frac\sum_i\omega_i(x_i - \text(X))^2$$ A) does not make sense to me when the weights are small. The third, C) is my interpretation of the answer to this question: https://mathoverflow.net/questions/22203/unbiased-estimate-of-the-variance-of-an-unnormalised-weighted-mean For C) I have just realized that the denominator looks a lot like $\text(\Omega)$. I think it does not entirely align; and obviously there is the connection that we are trying to compute the variance...The normalization value could be 0 or even negative. All three of them seem to "survive" the sanity check of setting all $\omega_i=1$. ''Update:'' whuber suggested to also do the sanity check with $\omega_1=\omega_2=.5$ and all remaining $\omega_i=\epsilon$ tiny. When you consider cases where the two largest weights are equal and all the rest become vanishingly small, both (A) and (B) drop from contention (because they disagree with the known results for $n=2$).

Let's say that at time $t$ you have $\bar x$, $\text$ and $s^2$, and an observation, $x_$ and you want to have those three computed quantities at time $t 1$.

Many possibilities exist but because of the incremental computation particular attention needs to be paid to numerical stability.

If we were to ignore numerical accuracy we could use a simple derivation to show that the following updates for is very large.

Because storing them all would mean storing Gigas, I'd like to store only things that would allow me to compute the global means and variances. Do you mean that you wish to update the mean and variance - that is, given a computed mean and variance on the first $t-1$ observations you want to compute them on $t$ observations?

Are these univariate variances, or variance-covariance matrices?

Dec

The first Halo was also out for PC, but didn't require that you were on Vista.