从调查中的分类变量的数据框生成交叉制表

时间:2014-12-31 04:17:50

标签: r dataframe crosstab

我有一些调查结果,我正在尝试做一些基本的交叉表。每列都是一种化学物质,数字0:5是它们的用途。

我试图想出一个提供频率和百分比的漂亮表格。使用table或xtabs,我能够为每列获得单独的结果,但我想找到一种方法来创建一个我能够输出到Latex中的漂亮表格包括一张桌子上的所有化学品。

感谢您提供的任何帮助。

数据框:

df <- read.table(text = "   
V1 V2 V3 V4 V5 V6 V7
1  NA NA NA NA NA NA NA
2   0  0  0  0  0  0  0
3   0  0  0  0  0  0 NA
4  NA NA NA NA NA NA  5
5   0  0  0  0  0  2  0
6  NA  4 NA NA NA NA NA
7   0  0  0  0  0  0  0
8  NA NA NA NA NA  3 NA
9  NA  2 NA NA NA  3 NA
10 NA  4 NA NA NA NA NA
11  0  0  0  0  0  0  0
12  0  0  0  0  0  0  0
13  0  0  0  0  0  0  0
14 NA NA NA NA NA  2  3
15 NA  3 NA  3 NA NA NA
16 NA  4 NA NA NA NA NA
17  0  0  0  0  0  0  0
18 NA  5 NA  5 NA NA NA
19  0  0  0  0  0  0  0
20 NA  1 NA NA NA NA NA", header = T)

所需输出(V1和V2的精确数字):

                     V1            V2           etc....
                  Freq Percent   Freq Percent
No                 9     100       9    56.2
Poor               0      0        1    6.2 
Somewhat effective 0      0        1    6.2
Good               0      0        1    6.2
Very Good          0      0        3    18.75
NA                 0      0        1    6.2

2 个答案:

答案 0 :(得分:3)

在这里,我们使用lapplytable获取每列的频率。 lapplydata.frame环境中获取list,然后在将列转换为table并将其指定为factor后,使用0:5。使用prop.table获取比例,cbind FreqPercent,将list转换为data.frame do.call(cbind,最后重命名row.namescolnames

  res <-  do.call(cbind,lapply(df, function(x) {
            x1 <- table(factor(x, levels=0:5,
               labels=c('No', 'Poor', 'Somewhat Effective', 
                               'Good', 'Very Good', 'NA') ))
             cbind(Freq=x1, Percent=round(100*prop.table(x1),2))}))
 colnames(res) <- paste(rep(paste0('V',1:7),each=2),
                                     colnames(res),sep=".")

  head(res,2)
  #     V1.Freq V1.Percent V2.Freq V2.Percent V3.Freq V3.Percent V4.Freq
  #No         9        100       9      56.25       9        100       9
  #Poor       0          0       1       6.25       0          0       0
  #     V4.Percent V5.Freq V5.Percent V6.Freq V6.Percent V7.Freq V7.Percent
  #No        81.82       9        100       8      66.67       8         80
  #Poor       0.00       0          0       0       0.00       0          0

答案 1 :(得分:2)

我不是常规的&#34; dplyr&#34;或者&#34; tidyr&#34;用户,所以我不确定这是否是使用这些工具的最佳方法(但似乎有效):

library(dplyr)
library(tidyr)
df %>%
  gather(var, val, V1:V7) %>%             ## Make the data long
  na.omit() %>%                           ## We don't need the NAs
  ## Factor the "value" column
  mutate(val = factor(val, 0:5, c("No", "Poor", "Somewhat Effective", 
                                  "Good", "Very Good", "NA"))) %>%
  group_by(val, var) %>%                  ## Group by val and var
  summarise(Freq = n()) %>%               ## Get the count
  group_by(var) %>%                       ## Group just by var now
  mutate(Pct = Freq/sum(Freq) * 100) %>%  ## Calculate the percent
  gather(R1, R2, Freq:Pct) %>%            ## Go long again....
  unite(Var, var, R1) %>%                 ## Combine the var and R1 cols
  spread(Var, R2, fill = 0)               ## Go wide....
# Source: local data frame [6 x 15]
# 
#                  val V1_Freq V1_Pct V2_Freq V2_Pct V3_Freq V3_Pct V4_Freq
# 1                 No       9    100       9  56.25       9    100       9
# 2               Poor       0      0       1   6.25       0      0       0
# 3 Somewhat Effective       0      0       1   6.25       0      0       0
# 4               Good       0      0       1   6.25       0      0       1
# 5          Very Good       0      0       3  18.75       0      0       0
# 6                 NA       0      0       1   6.25       0      0       1
# Variables not shown: V4_Pct (dbl), V5_Freq (dbl), V5_Pct (dbl), V6_Freq
#   (dbl), V6_Pct (dbl), V7_Freq (dbl), V7_Pct (dbl)

&#34; data.table&#34;方法在您必须经历的一系列步骤方面类似。

library(data.table)
library(reshape2)
levs <- c("No", "Poor", "Somewhat Effective", "Good", "Very Good", "NA")
DT <- melt(as.data.table(df, keep.rownames = TRUE), 
           id.vars = "rn", na.rm = TRUE)
DT <- DT[, value := factor(value, 0:5, levs)
         ][, list(Freq = .N), by = list(variable, value) 
           ][, Pct := Freq/sum(Freq) * 100, by = list(variable)]
dcast.data.table(melt(DT, id.vars = c("variable", "value")),
                 value ~ variable + variable.1, 
                 value.var = "value.1", fill = 0)

好的,还有一个......(@ akrun的一个变种&#39;答案)

library(gdata)      ## For "interleave"
levs <- c("No", "Poor", "Somewhat Effective", "Good", "Very Good", "NA")
x1 <- sapply(lapply(df, factor, 0:5, levs), table)
t(interleave(t(x1), t(prop.table(x1, 2))))

### Or, skipping the transposing....
## library(SOfun)   ## For "Riffle" which is like "interleave"
## Riffle(x1, prop.table(x1, 2) * 100)