R中个体和植被类型的卡方检验

时间:2014-02-04 19:49:20

标签: r statistics

我有7种不同个体的n种不同植被类型的覆盖率。我想测试每个植被类型在个体中的比例差异。即每只动物的植被组成是不同的。

我的数据在这里:

Data <- structure(list(IndID = structure(1:7, .Label = c("P06", "P07", 
"P08", "P09", "P10", "P12", "P13"), class = "factor"), Veg_V5 = c(0.045766316507, 
0.047303689688, 0.056893590139, 0.084802462906, 0.014449872446, 
0.09738444453, 0.064187261724), Veg_V9 = c(0.027242512682, 0.01079714987, 
0.012227657897, 0.026879196141, 0.021744009456, 0.029065982461, 
0.024820709696), Veg_V10 = c(0.002943062934, 0, 0, 0.001453619133, 
0.008588378756, 0.002336001225, 0.002511397265), Veg_V22 = c(0.003658113678, 
0.045570323716, 0.014352618087, 0.016906270086, 2.6184082e-05, 
0.006985026615, 0.020037857581), Veg_V30 = c(0.044989888016, 
0.157895085047, 0.098651407329, 0.049046292964, 0.016522155474, 
0.023712327193, 0.033410648111), Veg_V36 = c(0.301082396555, 
0.168989950447, 0.237744931683, 0.183522585412, 0.549495395342, 
0.162585685291, 0.113300807806), Veg_V38 = c(0.001445445925, 
0.02179277676, 0.008123926425, 0.003347045588, 0.000179547988, 
0.00297935894, 0.005683969181), Veg_V42 = c(0.063734651727, 0.127944902779, 
0.157800483468, 0.088456921086, 0.042171333667, 0.09721594608, 
0.20011730518), Veg_V46 = c(0.145959349519, 0.007588438052, 0.014483143897, 
0.171957277954, 0.06263606371, 0.186129514035, 0.120301794236
), Veg_V48 = c(0.110133159021, 0.020085874391, 0.064156046217, 
0.083554457713, 0.157755350904, 0.090943208364, 0.045370444427
), Veg_V50 = c(0.001423963713, 5.2927205e-05, 0.000297598847, 
0.000977251683, 0.001642115973, 0.003239765634, 0.000373243755
), Veg_V58 = c(0.016387858254, 0, 0.000125304778, 0.007978368047, 
0.020808863686, 0.014659365067, 0.009144471993), Veg_V62 = c(0.008344304605, 
0.018028329287, 0.016555893762, 0.039039847447, 0.001369053408, 
0.040562172098, 0.074840704897), Veg_V71 = c(0.002301665485, 
0.003784295175, 0.004443098578, 0.002252495537, 0.000834150027, 
0.001799869797, 0.000538537418), Veg_V79 = c(0.006720863217, 
0.003810758778, 0.003988868759, 0.008653984679, 0.000796744196, 
0.023176195765, 0.013575408569), Veg_V82 = c(0.035850741597, 
0.004082010705, 0.008061274036, 0.026926321733, 0.016492230809, 
0.030038678053, 0.021861419926), Veg_V86 = c(0.000564675266, 
0.004366494433, 0.003498091713, 0.000838080086, 0, 9.9567265e-05, 
0.00046922072), Veg_V114 = c(0.065990283903, 0.009679062659, 
0.014723311388, 0.065269827484, 0.015530900957, 0.05101673496, 
0.04312031779), Veg_V118 = c(0.003670389227, 0, 0, 0.000790739684, 
0.0007518572, 0.002083253552, 0.002596710123)), .Names = c("IndID", 
"Veg_V5", "Veg_V9", "Veg_V10", "Veg_V22", "Veg_V30", "Veg_V36", 
"Veg_V38", "Veg_V42", "Veg_V46", "Veg_V48", "Veg_V50", "Veg_V58", 
"Veg_V62", "Veg_V71", "Veg_V79", "Veg_V82", "Veg_V86", "Veg_V114", 
"Veg_V118"), class = "data.frame", row.names = c(NA, -7L))

看起来像这样:

  IndID     Veg_V5     Veg_V9     Veg_V10      Veg_V22    Veg_V30   Veg_V36
1   P06 0.04576632 0.02724251 0.002943063 3.658114e-03 0.04498989 0.3010824
2   P07 0.04730369 0.01079715 0.000000000 4.557032e-02 0.15789509 0.1689900
3   P08 0.05689359 0.01222766 0.000000000 1.435262e-02 0.09865141 0.2377449
4   P09 0.08480246 0.02687920 0.001453619 1.690627e-02 0.04904629 0.1835226
5   P10 0.01444987 0.02174401 0.008588379 2.618408e-05 0.01652216 0.5494954
6   P12 0.09738444 0.02906598 0.002336001 6.985027e-03 0.02371233 0.1625857
      Veg_V38    Veg_V42     Veg_V46    Veg_V48      Veg_V50      Veg_V58
1 0.001445446 0.06373465 0.145959350 0.11013316 0.0014239637 0.0163878583
2 0.021792777 0.12794490 0.007588438 0.02008587 0.0000529272 0.0000000000
3 0.008123926 0.15780048 0.014483144 0.06415605 0.0002975988 0.0001253048
4 0.003347046 0.08845692 0.171957278 0.08355446 0.0009772517 0.0079783680
5 0.000179548 0.04217133 0.062636064 0.15775535 0.0016421160 0.0208088637
6 0.002979359 0.09721595 0.186129514 0.09094321 0.0032397656 0.0146593651
      Veg_V62     Veg_V71      Veg_V79     Veg_V82      Veg_V86    Veg_V114
1 0.008344305 0.002301665 0.0067208632 0.035850742 5.646753e-04 0.065990284
2 0.018028329 0.003784295 0.0038107588 0.004082011 4.366494e-03 0.009679063
3 0.016555894 0.004443099 0.0039888688 0.008061274 3.498092e-03 0.014723311
4 0.039039847 0.002252496 0.0086539847 0.026926322 8.380801e-04 0.065269827
5 0.001369053 0.000834150 0.0007967442 0.016492231 0.000000e+00 0.015530901
6 0.040562172 0.001799870 0.0231761958 0.030038678 9.956727e-05 0.051016735
      Veg_V118
1 0.0036703892
2 0.0000000000
3 0.0000000000
4 0.0007907397
5 0.0007518572
6 0.0020832536

因为我想测试蔬菜类型的差异作为个体的函数,我不认为下面的简单代码是正确的。

chisq.test(Data[,c(2:20)])

任何想法或建议都将不胜感激。此外,由于这篇文章有统计主题而不是单独的技术,我不确定我是否应该在这里或交易所发布。

编辑:更改数据格式

使用melt()函数我可以将数据重组为:

库(重塑) test&lt; - melt(Data,id.vars = c(1),measure.vars = c(2:20)) test&lt; - test [,c(2,3)]

> head(test)
  variable      value
1   Veg_V5 0.04576632
2   Veg_V5 0.04730369
3   Veg_V5 0.05689359
4   Veg_V5 0.08480246
5   Veg_V5 0.01444987
6   Veg_V5 0.09738444

现在排除了IndID,我可以测试test $变量的每个级别的test $值的差异吗?我对因子水平内的变异不感兴趣,所以一个卡方检验似乎比ANOVA更合适。

test $ variable是一个具有19个级别的因子,每个级别有7个观察值。

现在,对每个因素水平进行多次观察,我该如何测试差异 在每个因素?

由于

1 个答案:

答案 0 :(得分:1)

在这种情况下,chisq.test()会将Data视为列联表,因此您不会回答您感兴趣的问题。

根据我对您的问题的理解,您将一个人的测量值与另一个人的另一个测量值进行比较。问题是,对于每个人,您只有一个测量,因此您无法估计估计比例的方差。我建议你考虑一下你的假设,并根据你有限的数据找到适合它的最佳测试。