Question about Statistics and Means, page 1

posted on Jun, 2 2007 @ 09:43 PM

link

Hi all,

I've got a quick question about statistics, with respect to means and standard deviations. This is not conspiracy related, but there's a lot of knowledgeable folks here and I thought someone might have an answer to this question. Mod's please feel free to move this if there's a better location for it.

Let's say you are trying to calculate a percent increase in a particular variable over time between a control group and an experimental group.

For example on day one of some experiment a particular variable is measured in a control group, and a value of 118.26 ± 8.75 is obtained. One week later, the same variable is measured, and a value of 126.27 ± 3.8 is obtained.

Similarly in an experimental group, on day one of this experiment the same variable is measured and a value of 115.26 ± 4.77 is obtained. Again, one week later, the same variable is measured, and a value of 207.82 ± 3.2 is obtained.

Now suppose you want to calculate the percent change over that one week period. The percent change between the two mean values is 6.77% ((126.27 - 118.26)/118.26) and 80.04% ((207.82 - 115.26)/115.26) for the control and experimental groups, respectively. Of course this doesn't take the SD into account.

If one wants to account for the standard deviation, what's the proper way to do this? One could simply do 4 different calculations to account for the SD's, For example, considering the experimental group.

115.26 - 4.77 = 110.49 and 115.26 + 4.77 = 120.03

and

207.82 - 3.2 = 204.62 and 207.82 + 3.2 = 211.02

Using these values, the following means can be calculated:

((204.62-110.49)/110.49) = .851 or 85.1%
((204.62-120.03)/120.03) = .704 or 70.4%
((211.02-110.49)/110.49) = .901 or 90.1%
((211.02-120.03)/120.03) = .758 or 75.8%

This yeilds values that are roughly 10 away from the value calculated when using the means only and not accounting for SD (10.06 and 10). However that doesn't work out with the traditional ± values contained in the original data. What is the correct way to handle these numbers? The high and low values could be added, and their mean calculated (90.1 + 70.4)/2 = 80.25 and 90.1 - 80.25 = 9.85 and 80.25 - 70.4 = 9.85.

So then is it proper to calculate the mean change for the experimental group over the one week period as 80.25 ± 9.85?

It makes sense to me, but it's been a loooong time since I've taken statistics. Any help/advice is greatly appreciated.

Byrd

posted on Jun, 3 2007 @ 12:06 PM

link

Lemme check my stats book... if I'm remembering correctly, you don't perform a mean or standard deviation on the rates of change. You use something else (I don't use stats reguarly, so I can't fish it out of the top of my head!)

kallikak

posted on Jun, 3 2007 @ 02:19 PM

link

Originally posted by Byrd
Lemme check my stats book... if I'm remembering correctly, you don't perform a mean or standard deviation on the rates of change. You use something else (I don't use stats reguarly, so I can't fish it out of the top of my head!)

Thanks... I appreciate your effort. I've been looking through me elementary stats book for a couple of days, and have not found the answer.

One problem is that I don't have access to the raw data, only the means, and SD.

Stats are not my thing either... I'm okay once they've been calculated, and the data is there, but I don't regularly use 'em either.

Thanks again for your effort.

MK

Byrd

posted on Jun, 3 2007 @ 06:39 PM

link

I think what you want is inferential statistics (I'm sorry, this is going to be a weak post because I'm tired and brain-dead at the moment) and the ANOVA in particular:
en.wikipedia.org...

Not my favorite stat. They do have programs to make this easier... of course, it's easy to run numbers through and get nonsense results. I'll think about it more when I'm not so fried.

Byrd

posted on Jun, 3 2007 @ 06:42 PM

link

Originally posted by kallikak
One problem is that I don't have access to the raw data, only the means, and SD.

Ergh. That may be a huge problem there. Try the one way ANOVA, because it is run on standard deviations. But without some access to the raw data and experimental setup for confirmation, I would not be 100% sure of the results.

Ah hates One Way ANOVAs, BTW. Just in case it wasn't clear.

kallikak

posted on Jun, 4 2007 @ 02:58 PM

link

Byrd,

Thanks for the links and advice. It's all just one huge reminder of why I don't like stats... calculating 'em I mean. I'm okay with data interpretation after the fact, but I hate messing around with the statistics.

As I mentioned, this data isn't even mine, so I'm kind of doing some meta-analysis, and don't have the raw data. If I did, all this would be much easier.

In any case, thanks for your efforts.

MK

melatonin

posted on Jun, 4 2007 @ 04:45 PM

link

Kallikak, byrd's on the right lines.

If you are looking to test for significance, you need to run a mixed design ANOVA - one between-subjects variable (control vrs experimental), one within-subjects (time point 1 vrs time point 2). It actually uses mean squares as the variance comparison though. So what I assume you want to know is whether there is a significant difference between condition 1 vrs 2, over the 2 time points, and whether there is an interaction?

For the data you have it looks like a mixed 2x2 ANOVA to me. Do you have SPSS?

ABE: aah, I see you only have means & SDs. Oh well.

You could maybe do something with this. 1.64SDs is determined as 95% of the population. So maybe what you could do is to calculate this new range and just see if the two distributions lie outside of each others boundaries, not really statistical hypothesis testing, but better than nowt.

So if you have two groups:

Group 1. Mean = 15.05 SD = 3.03
Group 2. Mean = 10.18 SD = 2.75

Then if you multiply the SDs by 1.64, you'll be able to show whether the populations lie outside of 95% of each group. This is what I'd normally do for single case neuropsych studies.

So

Group 1: 95% = 4.969
Group 2: 95% = 4.510

So 95% of the distribution is

Group 1: 10.081 to 19.746.

Group 2: 5.67 to 14.69.

So this example shows quite a bit of overlap of populations. If you happened to find there was no overlap, then this would be suggestive of a significant result.

You might be able to use some type of t-test as well though, but you'd have some familywise errors happening (which is why ANOVA is the most effective method with this design).

[edit on 4-6-2007 by melatonin]

Byrd

posted on Jun, 4 2007 @ 05:00 PM

link

Originally posted by melatonin
Kallikak, byrd's on the right lines.

If you are looking to test for significance, you need to run a mixed design ANOVA - one between-subjects variable (control vrs experimental), one within-subjects (time point 1 vrs time point 2). It actually uses mean squares as the variance comparison though. So what I assume you want to know is whether there is a significant difference between condition 1 vrs 2, over the 2 time points, and whether there is an interaction?

For the data you have it looks like a mixed 2x2 ANOVA to me. Do you have SPSS?

YAY!!!! Thank you so much for the answer, Melatonin! It's been two years since I ran stats (which means I need to go do more papers and include math!). It's a relief to have that guess confirmed!

I'd better go hunt up my stats books. I've got some good ones around here from a few years ago.

melatonin

posted on Jun, 4 2007 @ 07:19 PM

link

OK, what you need to do is a t-test:

You'll need to know, SD, mean (M), and sample size (n) for each group/set of data (4 groups I guess).

A1 A2
B1 B2

Thus you can compare A1 vrs A2 (within?), B1 vrs B2 (within), A1 vrs B1 (between), A2 vrs B2 (between), A1 vrs B2 (between), B1 vrs A2 (between) - 6 in total

For each comparison:

Pool the variance.

SEp = (n1 - 1)SD1 + (n2 - 1)SD2/(n1 + n2) - 2

Then calculate standard error of the difference between means:

SE(x1-x2) = sqroot of (SEp/n1 + SEp/n2)

Then finally t value:

t = (M2 - M1)/SE(x1-x2)

Then calculate degrees of freedom [(n1 + n2)-2], use a t statistic table, and check for significance.

That will work for comparisons between unrelated (independent) groups. For the within-subjects comparisons, you should really use a paired t-test if the sample is matched in any way (which is slightly different).

But this is not the optimal method of analysis for various complicated statistical reasons that ANOVA & post-hoc tests are able to ameliorate.

ABE:

If you just want to present the data and take account of propagating errors, then if you have this set of data:

A = 10.34 SE = 4.5
B = 19.56 SE = 3.9

then you just take the difference between means = 9.22

The calculate the new pooled variance = sqroot(3.9^2 + 4.5^2) = 5.955

So you'd have 9.22 +/-5.955

[edit on 5-6-2007 by melatonin]

kallikak

posted on Jun, 5 2007 @ 08:56 AM

link

Melatonin,

Thanks for the help. Reading through your posts confirms that Chemistry/Biochemistry was the correct path for me.

I appreciate your help, and I knew that there'd be someone knowledgeable with respect to to statistics on this board. Thanks again.

The next time you need help isolating protein or with biochemistry in general, I'm your man. I owe you one.

MK

melatonin

posted on Jun, 5 2007 @ 09:02 AM

link

No problem, glad to be able to help. You need to thank my D.C Howell book for the t-test maths though, heh.

Byrd

posted on Jun, 5 2007 @ 01:48 PM

link

(g) I know who I'm going to bother about some of the research! (just kidding)

Thanks so much for enlightening us. It's a wonderful tool, but if you only use it once every 5-6 years, it's just darn confusing.

melatonin

posted on Jun, 5 2007 @ 02:19 PM

link

Yeah, unless you use it a lot it can become pretty confusing.

ANOVA is bread and butter in psychology but don't ask me about structural equation modelling though, heh.