用于在R中创建具有两个分类变量的列联表的功能

时间:2018-02-08 10:40:52

标签: r function dataframe crosstab

我正在使用以下代码创建一个带有两个分类变量的交叉表:

library(dplyr)
library(reshape2)

T1.1<-table(data$Q7_1,data$Q9,exclude = NULL)
T1.1<-data.frame(T1.1)
T1.2<-dcast(T1.1, Var1~Var2)
T1.2<-T1.2%>%mutate(Industry=as.character(Var1),Total_responses=A+B+C)%>%select(Industry,A,B,C,Total_responses)
C<-c("Industry"="ALL", colSums(T1.2[,2:5]))
T1.2<-rbind(C,T1.2)

这给出了输出:

                     Industry  A  B  C Total_responses
1                         ALL 20 18 18              56
2  Banking/Financial Services  2  2  2               6
3                   Chemicals  0  1  2               3
4              Consumer Goods  1  1  1               3
5                      Energy  2  1  0               3
6                   High Tech  6  0  2               8
7       Insurance/Reinsurance  0  2  0               2
8               Life Sciences  0  0  0               0
9                   Logistics  0  0  2               2
10            Mining & Metals  1  1  1               3
11        Other Manufacturing  1  2  0               3
12    Other Non-Manufacturing  3  2  2               7
13         Retail & Wholesale  1  1  0               2
14   Services (Non-Financial)  2  4  5              11
15   Transportation Equipment  1  1  1               3
16                       <NA>  0  0  0               0

此输出没问题,但问题是在我使用table()函数之后,我将其转换为数据框,然后使用dcast获得所需的表格外观。在dcast之后,它创建了另一个NA,我不想要。

此外,我想使用这整个计算来创建一个函数,我可以将其用于更多级别的其他因素。

Q9有3个级别A,B和C,我不想像这样计算总响应,我想创建可以与任何其他具有不同级别数的因子一起使用的函数。请建议任何其他有效的方法。

> dput(data)
structure(list(Q7_1 = structure(c(5L, 5L, 14L, 1L, 9L, 13L, 1L, 
3L, 13L, 13L, 13L, 12L, 2L, 11L, 13L, 10L, 11L, 1L, 4L, 5L, 5L, 
4L, 5L, 9L, 2L, 4L, 13L, 10L, 13L, 13L, 11L, 1L, 11L, 5L, NA, 
1L, 9L, 3L, 1L, 5L, NA, 2L, NA, 6L, 14L, NA, NA, 14L, 8L, 11L, 
8L, 12L, 13L, NA, 3L, 11L, 11L, NA, 10L, 6L, 5L, 13L, 13L), .Label = c("Banking/Financial Services", 
"Chemicals", "Consumer Goods", "Energy", "High Tech", "Insurance/Reinsurance", 
"Life Sciences", "Logistics", "Mining & Metals", "Other Manufacturing", 
"Other Non-Manufacturing", "Retail & Wholesale", "Services (Non-Financial)", 
"Transportation Equipment"), class = "factor"), Q9 = structure(c(1L, 
3L, 3L, 3L, 3L, 1L, 1L, 3L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 1L, 
3L, 1L, 1L, 1L, 2L, 1L, 2L, 3L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 
1L, NA, 1L, 1L, 2L, 2L, 1L, NA, 2L, NA, 2L, 2L, NA, NA, 1L, 3L, 
1L, 3L, 1L, 3L, NA, 1L, 3L, 1L, NA, 2L, 2L, 3L, 3L, 2L), .Label = c("A", 
"B", "C"), class = "factor")), class = c("data.table", "data.frame"
), row.names = c(NA, -63L),  .Names = c("Q7_1", 
"Q9"))

&GT;

1 个答案:

答案 0 :(得分:1)

要将表格转换为数据框,我们可以使用as.data.frame.matrix()

crossCalc <- function(data){
  t <- table(data$Q7_1, data$Q9)
  t <- as.data.frame.matrix(t)
  Total_responses <- with(t, A + B + C)
  t <- cbind(t, Total_responses)
  t <- rbind(ALL=colSums(T1.1), T1.1)
  return(t)
  }

crossCalc(data)
#                             A  B  C Total_responses
# ALL                        20 18 18              56
# Banking/Financial Services  2  2  2               6
# Chemicals                   0  1  2               3
# Consumer Goods              1  1  1               3
# Energy                      2  1  0               3
# High Tech                   6  0  2               8
# Insurance/Reinsurance       0  2  0               2
# Life Sciences               0  0  0               0
# Logistics                   0  0  2               2
# Mining & Metals             1  1  1               3
# Other Manufacturing         1  2  0               3
# Other Non-Manufacturing     3  2  2               7
# Retail & Wholesale          1  1  0               2
# Services (Non-Financial)    2  4  5              11
# Transportation Equipment    1  1  1               3

也许这就是你想要的?