我有一个大型数据集,我需要为其生成多个交叉表。这些特别是二维表,用于生成频率以及均值和SD。
举个例子,我有以下数据 -
City <- c("A","B","A","A","B","C","D","A","D","C")
Q1 <- c("Agree","Agree","Agree","Agree","Agree","Neither","Neither","Disagree","Agree","Agree")
df <- data.frame(City,Q1)
记住数据,我想生成一个带有平均值的交叉表 -
City
A B C D
Agree 3 2 1 1
Neither 1 1
Disagree 1
Total 4 2 2 2
Mean 2.5 3 2.5 2.5
当生成均值时,同意的权重为3,权重不给2,不一致给定权重为1.交叉表输出的均值应低于总列数。在每列和每行之间都有网格线。
您能否建议如何在R?
中实现这一目标答案 0 :(得分:0)
这是一个解决方案:
x <- table(df$Q1, df$City) #building basic crosstab
#assigning weights to vector
weights <- c("Agree" = 3, "Disagree" = 1, "Neither" = 2)
#getting weighted mean
weightedmean <- apply(x, 2, function(x) {sum(x * weights)/sum(x)})
#building out table
x <- rbind(x,
apply(x, 2, sum), #row sums
weightedmean)
rownames(x)[4:5] <- c("Total", "Mean")
答案 1 :(得分:0)
以下是使用addmargins
的可能解决方案,该解决方案允许您将预定义函数传递到table
结果
wm <- function(x) sum(x * c(3, 1, 2)) / sum(x)
addmargins(table(df[2:1]), 1, list(list(Total = sum, Mean = wm)))
# City
# Q1 A B C D
# Agree 3.0 2.0 1.0 1.0
# Disagree 1.0 0.0 0.0 0.0
# Neither 0.0 0.0 1.0 1.0
# Total 4.0 2.0 2.0 2.0
# Mean 2.5 3.0 2.5 2.5
如果您想要SD,只需将, SD = sd
添加到功能列表