模拟来自R中Stata的命令

时间:2012-10-24 05:28:24

标签: r stata

我正试图在 R 中获得与Stata相似的2路表。我试图使用CrossTable包中的gmodels,但表格不一样。你知道如何在 R 中完成这项工作吗?

我希望至少从

获取频率

当cursmoke1 ==“是”& cursmoke2 ==“否”并颠倒

R 中,我只能从是,否和NA获得总数。

这是输出:

Stata

. tabulate cursmoke1 cursmoke2, cell column miss row


+-------------------+
| Key               |
|-------------------|
|     frequency     |
|  row percentage   |
| column percentage |
|  cell percentage  |
+-------------------+

   Current |
   smoker, |      Current smoker, exam 2
    exam 1 |        No        Yes          . |     Total
-----------+---------------------------------+----------
        No |     1,898        131        224 |     2,253 
           |     84.24       5.81       9.94 |    100.00 
           |     86.16       7.59      44.44 |     50.81 
           |     42.81       2.95       5.05 |     50.81 
-----------+---------------------------------+----------
       Yes |       305      1,596        280 |     2,181 
           |     13.98      73.18      12.84 |    100.00 
           |     13.84      92.41      55.56 |     49.19 
           |      6.88      35.99       6.31 |     49.19 
-----------+---------------------------------+----------
     Total |     2,203      1,727        504 |     4,434 
           |     49.68      38.95      11.37 |    100.00 
           |    100.00     100.00     100.00 |    100.00 
           |     49.68      38.95      11.37 |    100.00 

R

> CrossTable(cursmoke2, cursmoke1, missing.include = T, format="SAS")


   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|


Total Observations in Table:  4434 


             | cursmoke1 
   cursmoke2 |        No |       Yes |        NA | Row Total | 
-------------|-----------|-----------|-----------|-----------|
          No |      2203 |         0 |         0 |      2203 | 
             |  1122.544 |   858.047 |   250.409 |           | 
             |     1.000 |     0.000 |     0.000 |     0.497 | 
             |     1.000 |     0.000 |     0.000 |           | 
             |     0.497 |     0.000 |     0.000 |           | 
-------------|-----------|-----------|-----------|-----------|
         Yes |         0 |      1727 |         0 |      1727 | 
             |   858.047 |  1652.650 |   196.303 |           | 
             |     0.000 |     1.000 |     0.000 |     0.389 | 
             |     0.000 |     1.000 |     0.000 |           | 
             |     0.000 |     0.389 |     0.000 |           | 
-------------|-----------|-----------|-----------|-----------|
          NA |         0 |         0 |       504 |       504 | 
             |   250.409 |   196.303 |  3483.288 |           | 
             |     0.000 |     0.000 |     1.000 |     0.114 | 
             |     0.000 |     0.000 |     1.000 |           | 
             |     0.000 |     0.000 |     0.114 |           | 
-------------|-----------|-----------|-----------|-----------|
Column Total |      2203 |      1727 |       504 |      4434 | 
             |     0.497 |     0.389 |     0.114 |           | 
-------------|-----------|-----------|-----------|-----------|

1 个答案:

答案 0 :(得分:7)

也许我在这里遗漏了一些东西。 CrossTable的默认设置似乎基本上提供了您正在寻找的内容。

这里是CrossTable,参数最小。 (我已将数据集加载为“temp”。)请注意,结果与您从Stata输出中发布的结果相同(如果您希望结果为百分比,则只需要乘以100 )。

library(gmodels)
with(temp, CrossTable(cursmoke1, cursmoke2, missing.include=TRUE))

   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

Total Observations in Table:  4434 

             | cursmoke2 
   cursmoke1 |        No |       Yes |        NA | Row Total | 
-------------|-----------|-----------|-----------|-----------|
          No |      1898 |       131 |       224 |      2253 | 
             |   541.582 |   635.078 |     4.022 |           | 
             |     0.842 |     0.058 |     0.099 |     0.508 | 
             |     0.862 |     0.076 |     0.444 |           | 
             |     0.428 |     0.030 |     0.051 |           | 
-------------|-----------|-----------|-----------|-----------|
         Yes |       305 |      1596 |       280 |      2181 | 
             |   559.461 |   656.043 |     4.154 |           | 
             |     0.140 |     0.732 |     0.128 |     0.492 | 
             |     0.138 |     0.924 |     0.556 |           | 
             |     0.069 |     0.360 |     0.063 |           | 
-------------|-----------|-----------|-----------|-----------|
Column Total |      2203 |      1727 |       504 |      4434 | 
             |     0.497 |     0.389 |     0.114 |           | 
-------------|-----------|-----------|-----------|-----------|

或者,如果您希望数字显示为百分比,则可以使用format="SPSS"

with(temp, CrossTable(cursmoke1, cursmoke2, missing.include=TRUE, format="SPSS"))

   Cell Contents
|-------------------------|
|                   Count |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|-------------------------|

Total Observations in Table:  4434 

             | cursmoke2 
   cursmoke1 |       No  |      Yes  |       NA  | Row Total | 
-------------|-----------|-----------|-----------|-----------|
          No |     1898  |      131  |      224  |     2253  | 
             |  541.582  |  635.078  |    4.022  |           | 
             |   84.243% |    5.814% |    9.942% |   50.812% | 
             |   86.155% |    7.585% |   44.444% |           | 
             |   42.806% |    2.954% |    5.052% |           | 
-------------|-----------|-----------|-----------|-----------|
         Yes |      305  |     1596  |      280  |     2181  | 
             |  559.461  |  656.043  |    4.154  |           | 
             |   13.984% |   73.177% |   12.838% |   49.188% | 
             |   13.845% |   92.415% |   55.556% |           | 
             |    6.879% |   35.995% |    6.315% |           | 
-------------|-----------|-----------|-----------|-----------|
Column Total |     2203  |     1727  |      504  |     4434  | 
             |   49.684% |   38.949% |   11.367% |           | 
-------------|-----------|-----------|-----------|-----------|

更新:prop.table()

仅供参考(为了节省您在制作自己data.frame时所做的繁琐工作),您可能也对prop.table()功能感兴趣。

同样,使用您链接的数据并将其命名为“temp”,以下内容为您提供了构建data.frame的基础数据。您可能还有兴趣查看函数margin.table()addmargins()

## Your basic table
CurSmoke <- with(temp, table(cursmoke1, cursmoke2, useNA = "ifany"))
CurSmoke
#          cursmoke2
# cursmoke1   No  Yes <NA>
#       No  1898  131  224
#       Yes  305 1596  280

## Row proportions
prop.table(CurSmoke, 1) # * 100 # If you so desire
#          cursmoke2
# cursmoke1         No        Yes       <NA>
#       No  0.84243231 0.05814470 0.09942299
#       Yes 0.13984411 0.73177442 0.12838148

## Column proportions
prop.table(CurSmoke, 2) # * 100 # If you so desire
#          cursmoke2
# cursmoke1         No        Yes       <NA>
#       No  0.86155243 0.07585408 0.44444444
#       Yes 0.13844757 0.92414592 0.55555556

## Cell proportions
prop.table(CurSmoke)    # * 100 # If you so desire
#          cursmoke2
# cursmoke1         No        Yes       <NA>
#       No  0.42805593 0.02954443 0.05051872
#       Yes 0.06878665 0.35994587 0.06314840