三个问题：

1）请解释为什么我收到以下错误：`Error in Percent("row") : Summary fn not allowed with Percent`

library(tables)
set.seed(123)
df <- data.frame(exposure = sample(LETTERS[1:5], 100, TRUE),
             Group = sample(c("GroupX","GroupY"), 100, TRUE),
             disease = as.integer(sample(c(0,1), 100, TRUE)))

num <- function(x) base::sum(x, na.rm=TRUE)
tabular(Factor(exposure)+1~
          Factor(Group)*
          (Heading()*num*Heading(One)*disease*
             ((Total=1)+Percent("row"))), 
        data=df)

2）如何在每组*疾病频率之后创建以下理想表格以及组内百分比的附加列。请注意，没有疾病的人不包括在表中。

          Group                        
          GroupX         GroupY        
                 num            num    
 exposure Total  disease Total  disease
 A         9      4      13      6     
 B        12      4       9      5     
 C         9      8       9      6     
 D         7      1       8      3     
 E         9      4      15     12     
 All      46     21      54     32

这是一个开始：

tabular(Factor(exposure) + 1 ~ 
          Factor(Group) * 
            ((Total = 1) + num *  disease), data = df)

3）包使用`Percent()`。为什么会使用`Percent()`的逻辑向量。你能给我举个例子吗？使用逻辑向量会帮助我解决这个问题吗？

这类似于question;但是，提供的答案会计算不正确的百分比，如超过2列的示例所示。

Answer 1

从版本0.7.72开始，tables package可以计算子组百分比。对于提交的信用，这个答案归属于包维护者Duncan Murdoch。

SVN可从rForge获取更新的源包。可以找到安装源包的常规安装说明here。有关用法，请参阅下面的答案2。您阅读本文时可能会提供二进制包。

1）tables包只会计算每列一件事。 Percent实际上是一个“汇总函数”，通过定义新的汇总函数num，我已经要求它计算所有列中的num，并计算Percent in他们中有一些。如果我使用我的num函数，我需要在括号内移动它，这样它就不会以tables的形式“乘以”（在Percent语法意义上）。以下代码将生成患有疾病的人数（即疾病== 1），并且它将生成行百分比（组总计/行总计* 100），而不是期望的单元格/（子组行总数）。版本为tables＆lt; 0.7.72，就我们所能得到的那样。

library(tables) ## prior to 0.7.72

df <- data.frame(exposure = sample(LETTERS[1:5], 100, TRUE),
                 Group = sample(c("GroupX","GroupY"), 100, TRUE),
                 disease = as.integer(sample(c(0,1), 100, TRUE)))

num <- function(x) base::sum(x, na.rm=TRUE)
tabular(Factor(exposure)+1~
          Factor(Group)*
          (Heading("Group Total")*(1)+num*disease+Percent("row")),
        data=df)

2）tables package的版本0.7.72将计算所需的子组百分比。它引入了一个名为Equal()的伪函数。

set.seed(100)
library(tables)
df <- data.frame(exposure = sample(LETTERS[1:5], 100, TRUE),
                 Group = sample(c("GroupX","GroupY"), 100, TRUE),
                 disease = as.integer(sample(c(0,1), 100, TRUE)))

myTable <- tabular(Factor(exposure)+1~
                     Factor(Group)*
                     (Heading("Group Total")*(1)+Factor(disease)*((n=1)+Heading("%")*Percent(Equal(exposure,Group)))),
                   data=df)

myTable

myTable生成以下输出：

          Group                                                                
          GroupX                             GroupY                            
                      disease                            disease               
                      0             1                    0             1       
 exposure Group Total n       %     n  %     Group Total n       %     n  %    
 A         5           1      20.00  4 80.00  6           3      50.00  3 50.00
 B        17          12      70.59  5 29.41 10           3      30.00  7 70.00
 C        13           4      30.77  9 69.23 10           6      60.00  4 40.00
 D         8           2      25.00  6 75.00 13           7      53.85  6 46.15
 E         7           3      42.86  4 57.14 11           8      72.73  3 27.27
 All      50          22      44.00 28 56.00 50          27      54.00 23 46.00

来自Duncan的解释，

“阅读上述代码的一般方法是'显示当前单元格中值的百分比相对于x和y相等的所有单元格中的值。”

x和y现在被视为表达式;它有效地查看了子集发生位置的公式，并忽略了其他变量的子集。“

最后一步是将表（如矩阵）子集化，以仅保留所需的列（和/或行），如tabular()帮助文件中的最后一个示例所示：

myTable[,c(1,4,5,6,9,10)]

这给出了最终结果：

          Group                                              
          GroupX                    GroupY                   
                      disease                   disease      
                      1                         1            
 exposure Group Total n       %     Group Total n       %    
 A         5           4      80.00  6           3      50.00
 B        17           5      29.41 10           7      70.00
 C        13           9      69.23 10           4      40.00
 D         8           6      75.00 13           6      46.15
 E         7           4      57.14 11           3      27.27
 All      50          28      56.00 50          23      46.00

Answer 2

我可能会在这里咆哮错误的树，但在上面的第二个问题中，您是否尝试为GroupX的每个类别获取GroupY和exposure的百分比？如果是这样，那么ddply或基础R中的类似方法应该有效。

set.seed(123)
df <- data.frame(exposure = sample(LETTERS[1:5], 100, TRUE),
             Group = sample(c("GroupX","GroupY"), 100, TRUE),
             disease = as.integer(sample(c(0,1), 100, TRUE)))

library(plyr)
foo <- ddply(df,
             .(exposure, Group),
             summarise,
             total = sum(disease))
foo
ddply(foo,
      .(exposure),
      summarise,
      group = Group,
      total = total,
      pct.group = total/sum(total))

这给出了以下输出：

> foo
   exposure  Group total
1         A GroupX     4
2         A GroupY     4
3         B GroupX     8
4         B GroupY     6
5         C GroupX     6
6         C GroupY     4
7         D GroupX     5
8         D GroupY     4
9         E GroupX     4
10        E GroupY     3
> ddply(foo,
+       .(exposure),
+       summarise,
+       group = Group,
+       total = total,
+       pct.group = total/sum(total))
   exposure  group total pct.group
1         A GroupX     4 0.5000000
2         A GroupY     4 0.5000000
3         B GroupX     8 0.5714286
4         B GroupY     6 0.4285714
5         C GroupX     6 0.6000000
6         C GroupY     4 0.4000000
7         D GroupX     5 0.5555556
8         D GroupY     4 0.4444444
9         E GroupX     4 0.5714286
10        E GroupY     3 0.4285714

使用r“tables”包的嵌套表，列内子组总数，频率和百分比

三个问题：

1）请解释为什么我收到以下错误：`Error in Percent("row") : Summary fn not allowed with Percent`

2）如何在每组*疾病频率之后创建以下理想表格以及组内百分比的附加列。请注意，没有疾病的人不包括在表中。

3）包使用`Percent()`。为什么会使用`Percent()`的逻辑向量。你能给我举个例子吗？使用逻辑向量会帮助我解决这个问题吗？

2 个答案:

使用r“tables”包的嵌套表，列内子组总数，频率和百分比

三个问题：

1）请解释为什么我收到以下错误：Error in Percent("row") : Summary fn not allowed with Percent

2）如何在每组*疾病频率之后创建以下理想表格以及组内百分比的附加列。请注意，没有疾病的人不包括在表中。

3）包使用Percent()。为什么会使用Percent()的逻辑向量。你能给我举个例子吗？使用逻辑向量会帮助我解决这个问题吗？

2 个答案:

1）请解释为什么我收到以下错误：`Error in Percent("row") : Summary fn not allowed with Percent`

3）包使用`Percent()`。为什么会使用`Percent()`的逻辑向量。你能给我举个例子吗？使用逻辑向量会帮助我解决这个问题吗？