## Judging statistical significance from the overlap of confidence intervals

### March 21, 2011

My good friend Dr. Cameron (who is also a statistics instructor at UBC) recommended an article about the misconceptions of judging statistical significance from the overlap of interval estimates.  The article is called, Interval estimates for statistical communication: Problems and possible solutions by Cumming and Fidler (2005), and you can follow the link to read it.

The authors came up with a ‘rule of eye’ for judging significance for two independent means with 95% confidence intervals or standard errors.  First, we should distinguish between interval estimates based on the confidence interval and the standard error.  Both intervals use the standard error, which is calculated by dividing the standard deviation by the square root of the sample size.  The interval estimate based on the standard error is simply the mean plus or minus one standard error.  The 95% confidence interval is the mean plus or minus 1.96 times the standard error, so this confidence interval will be almost twice as large as the interval based on the standard error.  A definition of the 95% confidence interval is as follows, “were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter 95% of the time.”  The standard error of the mean is the standard deviation of the sampling distribution.  The standard error is smaller than the standard deviation because sample means do not vary as much as individuals.  For example, it is easy to find one person that is taller than 6 feet, but it is unlikely that the mean height of 25 people will be taller than 6 feet.

Rule of eye for 95% confidence intervals and two independent means

p is less than or equal to .05 when the bottom of one 95% confidence interval overlaps the top of the other by about 50 percent.  This is shown in the following figure from the article.  When will p be less than or equal to .01?  That occurs when the 95% confidence intervals do not overlap (or are almost touching). Rule of eye for standard errors and two independent means

p is less than or equal to .05 when the gap between the standard error bars is about the size of one standard error.  This is shown in the following figure, which is also from the article. Most of the research I do involves within participant comparisons, or dependent means.  In this case, unfortunately, interval estimates cannot be used to judge statistical significance.  Perhaps this is a discussion for a future post.

Update (April 21): I’ve written about dependent means in the following post, Confidence intervals in within-participant designs.

### 3 Responses to “Judging statistical significance from the overlap of confidence intervals”

1. GY Zou Says:

Do you know that the confidence interval for a difference between two parameters can be easily obtained from confidence limits for each parameter?

Zou and Donner (2008 Statistics in Medicine Vol 27: 1639 to 1702) have presented a simple formula for this problem. Specifically, suppose we are given confidence limits around est1 as low1 and upp1, and that around est2 as low2 and upp2. We have

Lower limit for the difference = est1 – est2 – square root of [ squre of (est1-low1) + square of(upp2 – est2)]

Upper limit for the difference = est – est2 + sqrt root of [ square of (upp1 – est1) + square of ( est2 – low2) ]

These expressions are dervied using the method of variances estimates recovery (MOVER). This is statististically valid because confidence limits for each parameter contain variance estimates. All we need to do is to recover them for the purpose of constructing confidence limits for the difference.

Thus, there is no need to calculate the percent of overlap to judge significance, because a confidence interval for the difference can not only tell you if the difference is statistically significance but also show you the magnitude of the true difference.

The formula can be extend readily to situation where the comparison groups are dependent (See Zou 2008 American Journal of Epidemiology 168: 212-224).

2. […] month I wrote about judging statistical significance from the overlap of confidence intervals.  At the end of the article I noted that these rules of eye cannot be used in within-participant […]

3. […] the confidence intervals overestimate the error terms in the ANOVA and prevent us from using a rule of eye to determine statistical […]