# The mean and standard deviation of a set of n1 observations are

Question:

The mean and standard deviation of a set of n1 observations are
and s1, respectively while the mean and standard deviation of another set of n2 observations are  and s2, respectively. Show that the standard deviation of the combined set of (n1 + n2) observations is given by

S.D. $=\sqrt{\frac{\mathrm{n}_{1}\left(\mathrm{~s}_{1}\right)^{2}+\mathrm{n}_{2}\left(\mathrm{~s}_{2}\right)^{2}}{\mathrm{n}_{1}+\mathrm{n}_{2}}+\frac{\mathrm{n}_{1} \mathrm{n}_{2}\left(\overline{\mathrm{x}}_{1}-\overline{\mathrm{x}}_{2}\right)^{2}}{\left(\mathrm{n}_{1}+\mathrm{n}_{2}\right)^{2}}}$

Solution:

Given the mean and standard deviation of a set of $n_{1}$ observations are $\bar{x}_{1}$ and $s_{1}$, respectively while the mean and standard deviation of another set of

$\mathrm{n}_{2}$ observations are $\overline{\mathrm{x}}_{2}$ and $s_{2}$, respectively

To show that the standard deviation of the combined set of $\left(n_{1}+n_{2}\right)$ observations is given by

S. D $=\sqrt{\frac{n_{1}\left(s_{1}\right)^{2}+n_{2}\left(s_{2}\right)^{2}}{n_{1}+n_{2}}+\frac{n_{1} n_{2}\left(\bar{x}_{1}-\bar{x}_{2}\right)^{2}}{\left(n_{1}+n_{2}\right)^{2}}}$

As per given criteria, For first set

Let $x_{i}$ where $i=1,2,3,4, \ldots, n_{1}$

For second set

And $y_{i}$ where $j=1,2,3,4, \ldots, n_{2}$

And the means are

$\overline{x_{1}}=\frac{1}{n_{1}} \sum_{i=1}^{n} x_{i}, \overline{x_{2}}=\frac{1}{n_{2}} \sum_{j=1}^{n} y_{j}$

Now mean of the combined series is given by

$\bar{x}=\frac{1}{n_{1}+n_{2}}\left[\sum_{i=1}^{n} x_{j}+\sum_{j=1}^{n} y_{j}\right]=\frac{n_{1} \bar{x}_{1}+n_{2} \bar{x}_{2}}{n_{1}+n_{2}} \ldots . .(i)$

And the corresponding square of standard deviation is

$\sigma_{1}^{2}=\frac{1}{n_{1}} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}, \sigma_{2}^{2}=\frac{1}{n_{2}} \sum_{j=1}^{n}\left(y_{j}-\bar{x}\right)^{2}$

Therefore, square of standard deviation becomes,

$\sigma^{2}=\sigma_{1}^{2}+\sigma_{2}^{2}=\frac{1}{n_{1}+n_{2}}\left[\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}+\sum_{j=1}^{n}\left(y_{j}-\bar{x}\right)^{2}\right] \ldots . .$ (ii)

Now,

$\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}=\sum_{i=1}^{n}\left(x_{i}-\bar{x}_{j}+\bar{x}_{j}-\bar{x}\right)^{2}$

$\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}=\sum_{i=1}^{n}\left(x_{i}-\bar{x}_{j}\right)^{2}+n_{1}\left(\bar{x}_{j}-\bar{x}\right)^{2}+2\left(\bar{x}_{j}-\bar{x}\right) \sum_{i=1}^{n}\left(x_{i}-\overline{x_{j}}\right)^{2}$

But the algebraic sum of the deviation of values of first series from their mean is zero.

$\sum_{i=1}^{n}\left(x_{i}-\overline{x_{j}}\right)^{2}=0$

Also,

$\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}=n_{1} s_{1}^{2}+n_{1}\left(\bar{x}_{1}-\bar{x}\right)^{2} \ldots$(iii)

But

$\mathrm{d}_{1}=\overline{\mathrm{X}}_{1}-\overline{\mathrm{x}}$

Substituting value from equation (i), we get

$\overline{\mathrm{x}}_{1}-\overline{\mathrm{x}}=\overline{\mathrm{x}}_{1}-\frac{\mathrm{n}_{1} \overline{\mathrm{x}}_{1}+\mathrm{n}_{2} \overline{\mathrm{x}}_{2}}{\mathrm{n}_{1}+\mathrm{n}_{2}}$

$\overline{\mathrm{x}}_{1}-\overline{\mathrm{x}}=\frac{\left(\overline{\mathrm{x}}_{1}\right)\left(\mathrm{n}_{1}+\mathrm{n}_{2}\right)-\left(\mathrm{n}_{1} \overline{\mathrm{x}}_{1}+\mathrm{n}_{2} \overline{\mathrm{x}}_{2}\right)}{\mathrm{n}_{1}+\mathrm{n}_{2}}$

$\overline{\mathrm{x}}_{1}-\overline{\mathrm{x}}=\frac{\left(\mathrm{n}_{1} \overline{\mathrm{x}}_{1}+\mathrm{n}_{2} \overline{\mathrm{x}}_{1}\right)-\left(\mathrm{n}_{1} \overline{\mathrm{x}}_{1}+\mathrm{n}_{2} \overline{\mathrm{x}}_{2}\right)}{\mathrm{n}_{1}+\mathrm{n}_{2}}$

$\bar{x}_{1}-\bar{x}=\frac{\left(n_{2} \bar{x}_{1}\right)-\left(n_{2} \bar{x}_{2}\right)}{n_{1}+n_{2}}$

Substituting this value in equation (iii), we get

$\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}=n_{1} s_{1}^{2}+n_{1}\left(\frac{n_{2}\left(\bar{x}_{1}-\bar{x}_{2}\right)}{n_{1}+n_{2}}\right)^{2}$

$\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}=n_{1} s_{1}^{2}+\frac{n_{1} n_{2}^{2}\left(\bar{x}_{1}-\bar{x}_{2}\right)^{2}}{\left(n_{1}+n_{2}\right)^{2}} \ldots$ (iv)

Similarly, we have

$\sum_{j=1}^{n}\left(y_{i}-\bar{x}\right)^{2}=\sum_{j=1}^{n}\left(y_{j}-\bar{x}_{i}+\bar{x}_{i}-\bar{x}\right)^{2}$

$\sum_{j=1}^{n}\left(y_{i}-\bar{x}\right)^{2}=\sum_{j=1}^{n}\left(y_{j}-\bar{x}_{j}\right)^{2}+n_{2}\left(\bar{x}_{j}-\bar{x}\right)^{2}+2\left(\bar{x}_{j}-\bar{x}\right) \sum_{j=1}^{n}\left(y_{j}-\bar{x}_{j}\right)^{2}$

But the algebraic sum of the deviation of values of second series from their mean is zero.

$\sum_{j=1}^{n}\left(y_{j}-\overline{x_{i}}\right)^{2}=0$

Also,

$\sum_{\mathrm{j}=1}^{\mathrm{n}}\left(\mathrm{y}_{\mathrm{i}}-\overline{\mathrm{x}}\right)^{2}=\mathrm{n}_{2} \mathrm{~s}_{2}^{2}+\mathrm{n}_{2}\left(\overline{\mathrm{x}}_{2}-\overline{\mathrm{x}}\right)^{2} \ldots(\mathrm{v})$

But $d_{2}=\bar{x}_{2}-\bar{x}$

Substituting value from equation (i), we get

$\overline{\mathrm{x}}_{2}-\overline{\mathrm{x}}=\overline{\mathrm{x}}_{2}-\frac{\mathrm{n}_{1} \overline{\mathrm{x}}_{1}+\mathrm{n}_{2} \overline{\mathrm{x}}_{2}}{\mathrm{n}_{1}+\mathrm{n}_{2}}$

$\overline{\mathrm{x}}_{2}-\overline{\mathrm{x}}=\frac{\left(\overline{\mathrm{x}}_{2}\right)\left(\mathrm{n}_{1}+\mathrm{n}_{2}\right)-\left(\mathrm{n}_{1} \overline{\mathrm{x}}_{1}+\mathrm{n}_{2} \overline{\mathrm{x}}_{2}\right)}{\mathrm{n}_{1}+\mathrm{n}_{2}}$

$\bar{x}_{2}-\bar{x}=\frac{\left(n_{1} \bar{x}_{2}+n_{2} \bar{x}_{2}\right)-\left(n_{1} \bar{x}_{1}+n_{2} \bar{x}_{2}\right)}{n_{1}+n_{2}}$

$\bar{x}_{2}-\bar{x}=\frac{\left(n_{1} \bar{x}_{2}\right)-\left(n_{1} \bar{x}_{2}\right)}{n_{1}+n_{2}}$

$\overline{\mathrm{x}}_{2}-\overline{\mathrm{x}}=\frac{\mathrm{n}_{1}\left(\overline{\mathrm{x}}_{2}-\overline{\mathrm{x}}_{1}\right)}{\mathrm{n}_{1}+\mathrm{n}_{2}}$

Substituting this value in equation (v), we get

$\sum_{j=1}^{n}\left(y_{j}-\bar{x}\right)^{2}=n_{2} s_{2}^{2}+n_{2}\left(\frac{n_{1}\left(\bar{x}_{2}-\bar{x}_{1}\right)}{n_{1}+n_{2}}\right)^{2}$

$\sum_{j=1}^{n}\left(y_{j}-\bar{x}\right)^{2}=n_{2} s_{2}^{2}+\frac{n_{2} n_{1}^{2}\left(\bar{x}_{2}-\bar{x}_{1}\right)^{2}}{\left(n_{1}+n_{2}\right)^{2}} \ldots(v i)$

Substituting equation (iv) and (vi) in equation (ii), we get

$\sigma^{2}=\sigma_{1}^{2}+\sigma_{2}^{2}=\frac{1}{n_{1}+n_{2}}\left[\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}+\sum_{j=1}^{n}\left(y_{j}-\bar{x}\right)^{2}\right]$

$\sigma^{2}=\frac{1}{n_{1}+n_{2}}\left[n_{1} s_{1}^{2}+\frac{n_{1} n_{2}^{2}\left(\bar{x}_{1}-\bar{x}_{2}\right)^{2}}{\left(n_{1}+n_{2}\right)^{2}}+n_{2} s_{2}^{2}+\frac{n_{2} n_{1}^{2}\left(\bar{x}_{2}-\bar{x}_{1}\right)^{2}}{\left(n_{1}+n_{2}\right)^{2}}\right]$

$\sigma^{2}=\frac{1}{n_{1}+n_{2}}\left[n_{1} s_{1}^{2}+n_{2} s_{2}^{2}+\frac{n_{1} n_{2}^{2}\left(\bar{x}_{1}-\bar{x}_{2}\right)^{2}}{\left(n_{1}+n_{2}\right)^{2}}+\frac{n_{2} n_{1}^{2}\left(-\left(\bar{x}_{1}-\bar{x}_{2}\right)\right)^{2}}{\left(n_{1}+n_{2}\right)^{2}}\right]$

$\sigma^{2}=\frac{1}{n_{1}+n_{2}}\left[n_{1} s_{1}^{2}+n_{2} s_{2}^{2}+\frac{n_{1} n_{2}^{2}\left(\bar{x}_{1}-\bar{x}_{2}\right)^{2}}{\left(n_{1}+n_{2}\right)^{2}}+\frac{n_{2} n_{1}^{2}\left(\bar{x}_{1}-\bar{x}_{2}\right)^{2}}{\left(n_{1}+n_{2}\right)^{2}}\right]$

$\sigma^{2}=\frac{1}{n_{1}+n_{2}}\left[n_{1} s_{1}^{2}+n_{2} s_{2}^{2}+\frac{n_{1} n_{2}\left(\bar{x}_{1}-\bar{x}_{2}\right)^{2}}{\left(n_{1}+n_{2}\right)^{2}}\left(n_{2}+n_{1}\right)\right]$

$\sigma^{2}=\left[\frac{n_{1} s_{1}^{2}+n_{2} s_{2}^{2}}{n_{1}+n_{2}}+\frac{\frac{n_{1} n_{2}\left(\bar{x}_{1}-\bar{x}_{2}\right)^{2}}{\left(n_{1}+n_{2}\right)^{2}}\left(n_{2}+n_{1}\right)}{n_{1}+n_{2}}\right]$

$\sigma^{2}=\left[\frac{\mathrm{n}_{1} \mathrm{~s}_{1}^{2}+\mathrm{n}_{2} \mathrm{~s}_{2}^{2}}{\mathrm{n}_{1}+\mathrm{n}_{2}}+\frac{\mathrm{n}_{1} \mathrm{n}_{2}\left(\overline{\mathrm{x}}_{1}-\overline{\mathrm{x}}_{2}\right)^{2}}{\left(\mathrm{n}_{1}+\mathrm{n}_{2}\right)^{2}}\right]$

So the combined standard deviation

$S . D(\sigma)=\sqrt{\frac{n_{1} s_{1}^{2}+n_{2} s_{2}^{2}}{n_{1}+n_{2}}+\frac{n_{1} n_{2}\left(\bar{x}_{1}-\bar{x}_{2}\right)^{2}}{\left(n_{1}+n_{2}\right)^{2}}}$

Hence proved