如何计算交叉表中行和列的百分比?

时间:2019-05-22 11:50:05

标签: r matrix percentage crosstab

我有一个三级列联表,我试图为它计算表中每个单元的百分比,分别是每一行的总行总数的函数,然后是每一列的总列总数的函数。这是我要计算百分比的数据:

"A2"
                 0_15m 15_30m 30_>40m

<35yrs  0_4cm      217     30       3
        20_80cm    282     42      14
        4_20cm     315    182      82
>=35yrs 0_4cm      334     63       3
        20_80cm    310     75      23
        4_20cm     433    110      95

dput(A2)

structure(c(217L, 282L, 315L, 334L, 310L, 433L, 30L, 42L, 182L, 
63L, 75L, 110L, 3L, 14L, 82L, 3L, 23L, 95L), .Dim = c(6L, 3L), class = "ftable", row.vars = structure(list(
    c("<35yrs", ">=35yrs"), c("0_4cm", "20_80cm", "4_20cm")), .Names = c("", 
"")), col.vars = structure(list(c("0_15m", "15_30m", "30_>40m"
)), .Names = ""))

我尝试使用colPercent函数以及手动计算(请参见示例):

其中A2是上面数据的表:

rpc <- A2 / rowSums(A2) * 100
cpc <- A2 / colSums(A2) * 100


如您所见,行百分比已正确计算(所有行的总和为100),但是列百分比在某些情况下超过100%,因此不能正确计算。


"Row percentages"
                     0_15m    15_30m   30_>40m

<35yrs  0_4cm    86.800000 12.000000  1.200000
        20_80cm  83.431953 12.426036  4.142012
        4_20cm   54.404145 31.433506 14.162349
>=35yrs 0_4cm    83.500000 15.750000  0.750000
        20_80cm  75.980392 18.382353  5.637255
        4_20cm   67.868339 17.241379 14.890282

 "Column Percentages"
                       0_15m      15_30m     30_>40m

<35yrs  0_4cm     11.4754098   1.5864622   0.1586462
        20_80cm   56.1752988   8.3665339   2.7888446
        4_20cm   143.1818182  82.7272727  37.2727273
>=35yrs 0_4cm     17.6626124   3.3315706   0.1586462
        20_80cm   61.7529880  14.9402390   4.5816733
        4_20cm   196.8181818  50.0000000  43.1818182

3 个答案:

答案 0 :(得分:2)

按这里的需要,划分按行而不是按列进行。要使colSums正常工作,您可以转置,分割然后再次转置

t(t(as.matrix(A2))/colSums(A2)) * 100

#                 0_15m 15_30m 30_>40m
#                                     
#<35yrs  0_4cm    11.48   5.98    1.36
#        20_80cm  14.91   8.37    6.36
#        4_20cm   16.66  36.25   37.27
#>=35yrs 0_4cm    17.66  12.55    1.36
#        20_80cm  16.39  14.94   10.45
#        4_20cm   22.90  21.91   43.18

或者另一个选择是

A2 / colSums(A2)[col(A2)] * 100

答案 1 :(得分:1)

不确定colSums为何不起作用。可能与表的结构有关,但是apply方法似乎可行,

apply(df, 2, function(i)i*100 / sum(i))

#                    0_15m    15_30m   30_>40m
#  <35yrs_0_4cm    11.47541  5.976096  1.363636
#  <35yrs_20_80cm  14.91274  8.366534  6.363636
#  <35yrs_4_20cm   16.65785 36.254980 37.272727
#  >=35yrs_0_4cm   17.66261 12.549801  1.363636
#  >=35yrs_20_80cm 16.39344 14.940239 10.454545
#  >=35yrs_4_20cm  22.89794 21.912351 43.181818

答案 2 :(得分:0)

我发现CrossTable可以根据需要自动计算一切,包括行和列百分比以及标准化的单元格残差。


library(gmodels)
CrossTable(data_tf$Height_fac, data_tf$Dia_fac, digits=2, expected=TRUE, prop.r=TRUE, prop.c=TRUE, prop.t=FALSE, prop.chisq=TRUE, sresid=TRUE, format=c("SPSS"), dnn = c("Height","Diameter"))


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  2613 

             | Diameter 
      Height |    0_4cm  |  20_80cm  |   4_20cm  | Row Total | 
-------------|-----------|-----------|-----------|-----------|
       0_15m |      551  |      592  |      748  |     1891  | 
             |   470.40  |   539.87  |   880.73  |           | 
             |    13.81  |     5.03  |    20.00  |           | 
             |    29.14% |    31.31% |    39.56% |    72.37% | 
             |    84.77% |    79.36% |    61.46% |           | 
             |     3.72  |     2.24  |    -4.47  |           | 
-------------|-----------|-----------|-----------|-----------|
      15_30m |       93  |      117  |      292  |      502  | 
             |   124.88  |   143.32  |   233.81  |           | 
             |     8.14  |     4.83  |    14.48  |           | 
             |    18.53% |    23.31% |    58.17% |    19.21% | 
             |    14.31% |    15.68% |    23.99% |           | 
             |    -2.85  |    -2.20  |     3.81  |           | 
-------------|-----------|-----------|-----------|-----------|
     30_>40m |        6  |       37  |      177  |      220  | 
             |    54.73  |    62.81  |   102.46  |           | 
             |    43.38  |    10.61  |    54.22  |           | 
             |     2.73% |    16.82% |    80.45% |     8.42% | 
             |     0.92% |     4.96% |    14.54% |           | 
             |    -6.59  |    -3.26  |     7.36  |           | 
-------------|-----------|-----------|-----------|-----------|
Column Total |      650  |      746  |     1217  |     2613  | 
             |    24.88% |    28.55% |    46.57% |           | 
-------------|-----------|-----------|-----------|-----------|


Statistics for All Table Factors

Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  174.51     d.f. =  4     p =  1.125591e-36