我有一个三级列联表,我试图为它计算表中每个单元的百分比,分别是每一行的总行总数的函数,然后是每一列的总列总数的函数。这是我要计算百分比的数据:
"A2"
0_15m 15_30m 30_>40m
<35yrs 0_4cm 217 30 3
20_80cm 282 42 14
4_20cm 315 182 82
>=35yrs 0_4cm 334 63 3
20_80cm 310 75 23
4_20cm 433 110 95
dput(A2)
structure(c(217L, 282L, 315L, 334L, 310L, 433L, 30L, 42L, 182L,
63L, 75L, 110L, 3L, 14L, 82L, 3L, 23L, 95L), .Dim = c(6L, 3L), class = "ftable", row.vars = structure(list(
c("<35yrs", ">=35yrs"), c("0_4cm", "20_80cm", "4_20cm")), .Names = c("",
"")), col.vars = structure(list(c("0_15m", "15_30m", "30_>40m"
)), .Names = ""))
我尝试使用colPercent函数以及手动计算(请参见示例):
其中A2是上面数据的表:
rpc <- A2 / rowSums(A2) * 100
cpc <- A2 / colSums(A2) * 100
如您所见,行百分比已正确计算(所有行的总和为100),但是列百分比在某些情况下超过100%,因此不能正确计算。
"Row percentages"
0_15m 15_30m 30_>40m
<35yrs 0_4cm 86.800000 12.000000 1.200000
20_80cm 83.431953 12.426036 4.142012
4_20cm 54.404145 31.433506 14.162349
>=35yrs 0_4cm 83.500000 15.750000 0.750000
20_80cm 75.980392 18.382353 5.637255
4_20cm 67.868339 17.241379 14.890282
"Column Percentages"
0_15m 15_30m 30_>40m
<35yrs 0_4cm 11.4754098 1.5864622 0.1586462
20_80cm 56.1752988 8.3665339 2.7888446
4_20cm 143.1818182 82.7272727 37.2727273
>=35yrs 0_4cm 17.6626124 3.3315706 0.1586462
20_80cm 61.7529880 14.9402390 4.5816733
4_20cm 196.8181818 50.0000000 43.1818182
答案 0 :(得分:2)
按这里的需要,划分按行而不是按列进行。要使colSums
正常工作,您可以转置,分割然后再次转置
t(t(as.matrix(A2))/colSums(A2)) * 100
# 0_15m 15_30m 30_>40m
#
#<35yrs 0_4cm 11.48 5.98 1.36
# 20_80cm 14.91 8.37 6.36
# 4_20cm 16.66 36.25 37.27
#>=35yrs 0_4cm 17.66 12.55 1.36
# 20_80cm 16.39 14.94 10.45
# 4_20cm 22.90 21.91 43.18
或者另一个选择是
A2 / colSums(A2)[col(A2)] * 100
答案 1 :(得分:1)
不确定colSums
为何不起作用。可能与表的结构有关,但是apply
方法似乎可行,
apply(df, 2, function(i)i*100 / sum(i))
# 0_15m 15_30m 30_>40m
# <35yrs_0_4cm 11.47541 5.976096 1.363636
# <35yrs_20_80cm 14.91274 8.366534 6.363636
# <35yrs_4_20cm 16.65785 36.254980 37.272727
# >=35yrs_0_4cm 17.66261 12.549801 1.363636
# >=35yrs_20_80cm 16.39344 14.940239 10.454545
# >=35yrs_4_20cm 22.89794 21.912351 43.181818
答案 2 :(得分:0)
我发现CrossTable可以根据需要自动计算一切,包括行和列百分比以及标准化的单元格残差。
library(gmodels)
CrossTable(data_tf$Height_fac, data_tf$Dia_fac, digits=2, expected=TRUE, prop.r=TRUE, prop.c=TRUE, prop.t=FALSE, prop.chisq=TRUE, sresid=TRUE, format=c("SPSS"), dnn = c("Height","Diameter"))
Cell Contents
|-------------------------|
| Count |
| Expected Values |
| Chi-square contribution |
| Row Percent |
| Column Percent |
| Std Residual |
|-------------------------|
Total Observations in Table: 2613
| Diameter
Height | 0_4cm | 20_80cm | 4_20cm | Row Total |
-------------|-----------|-----------|-----------|-----------|
0_15m | 551 | 592 | 748 | 1891 |
| 470.40 | 539.87 | 880.73 | |
| 13.81 | 5.03 | 20.00 | |
| 29.14% | 31.31% | 39.56% | 72.37% |
| 84.77% | 79.36% | 61.46% | |
| 3.72 | 2.24 | -4.47 | |
-------------|-----------|-----------|-----------|-----------|
15_30m | 93 | 117 | 292 | 502 |
| 124.88 | 143.32 | 233.81 | |
| 8.14 | 4.83 | 14.48 | |
| 18.53% | 23.31% | 58.17% | 19.21% |
| 14.31% | 15.68% | 23.99% | |
| -2.85 | -2.20 | 3.81 | |
-------------|-----------|-----------|-----------|-----------|
30_>40m | 6 | 37 | 177 | 220 |
| 54.73 | 62.81 | 102.46 | |
| 43.38 | 10.61 | 54.22 | |
| 2.73% | 16.82% | 80.45% | 8.42% |
| 0.92% | 4.96% | 14.54% | |
| -6.59 | -3.26 | 7.36 | |
-------------|-----------|-----------|-----------|-----------|
Column Total | 650 | 746 | 1217 | 2613 |
| 24.88% | 28.55% | 46.57% | |
-------------|-----------|-----------|-----------|-----------|
Statistics for All Table Factors
Pearson's Chi-squared test
------------------------------------------------------------
Chi^2 = 174.51 d.f. = 4 p = 1.125591e-36