这是我在这里的第一篇文章,我是编程和R的新手。所以请原谅任何愚蠢。
我有以下数据框:
a <- data.frame("sickness1" = c(1,1,2,3,3,5,6, 4, 4, 4),
"sickness2" = c(NA, NA, 3, 3, 4, 6, 1, 2, 5, 6),
"sickness3" = c(NA, NA, 3, 4, 4, 6, 1, 2, 5, 6),
"sickness4" = c(NA, NA, 6, 3, 4, 6, 1, 2, 5, 6))
每行代表一个案例。每列是有序因子变量。我将变量转换为这样的因素(使用我在stackoverflow上找到的提示!):
a[] <- lapply(a, factor,
levels = c(1:6),
labels = c(3, 25, 50, 75, 97, 100))
我想得到以下输出:
percent sickness1 sickness2 sickness3 sickness4
1 3 1 1 1 2
2 25 1 1 1 1
3 50 2 1 1 2
4 75 1 2 1 3
5 97 1 1 1 1
6 100 2 2 3 1
我已经找到了一个非常漫长的解决方案:
# counting
ab <- ldply(lapply(a, count))
#getting it into the right format
ab2 <- dcast(
data = ab,
formula = x ~ .id,
value.var = "freq")
# changing the name of the first column
colnames(ab2)[1] <- "percent"
#deleting row 7 cause it contains the NAs which I dont want to have
ab2 <- ab2[-7,]
ab2
有更快更简单的方法吗?喜欢以某种方式使用ddply? 摘要(a)给出的输出太乱了,我不知道如何操纵它来看我想要的方式。我正在使用的真实数据也更大,我必须做很多次这样的事情......
答案 0 :(得分:1)
好的,所以我发现有两种可能的解决方案:
Nr1 by akrun:
un1 <- as.character(sort(unique(unlist(a, use.names=FALSE))))
data.frame(percent=un1,do.call(cbind,
lapply(a, function(x) table(factor(x, levels=un1)))))
nr.2 by alexis_laz:
鉴于我可以轻松地使数据看起来像这样:(这只是上面一个为该机构添加了一列的数据框)
a <- data.frame("institution" = c(1:10), "sickness1" = c(1,1,2,3,3,5,6, 4, 4, 4),
"sickness2" = c(NA, NA, 3, 3, 4, 6, 1, 2, 5, 6),
"sickness3" = c(NA, NA, 3, 4, 4, 6, 1, 2, 5, 6),
"sickness4" = c(NA, NA, 6, 3, 4, 6, 1, 2, 5, 6))
a[-1] <- lapply(a[-1], factor,
levels = c(1:6),
labels = c("0 to 3%","4-25%", "25-50%", "51-75%","76-97%","97-100%"))
然后我可以将这种宽数据格式转换为长数据格式,如下所示:
b2 <- melt(a, id.vars = "institution")
然后普通的表函数起作用:
table(b2[[3]], b2[[2]])
请注意,订购很重要
非常感谢你们!
答案 1 :(得分:1)
这主要是主题类型答案的变体。同时使用stack
和table
,如下所示:
as.data.frame.matrix( ## converts the output to a data.frame
table( ## does the actual tabulation
stack( ## stack makes your data.frame long
lapply(a, as.character)), ## but won't work with factors; convert to char
useNA = "no") ## we don't want NA values
)[levels(a[[1]]), ] ## We want our rows in a nicer order
# sickness1 sickness3 sickness4 sickness5
# 3 2 1 1 1
# 25 1 1 1 1
# 50 2 2 1 1
# 75 3 1 2 1
# 97 1 1 1 1
# 100 1 2 2 3
或者,这是“dplyr”+“tidyr”方法:
library(dplyr)
library(tidyr)
a %>% gather(var, val, sickness1:sickness5) %>% ## make the data long
mutate(val = factor(val, levels(unlist(a)))) %>% ## refactor "val" column
rev %>% ## reverse the order of val and var
table %>% ## make your table
as.data.frame.matrix ## convert it to a data.frame
# sickness1 sickness3 sickness4 sickness5
# 3 2 1 1 1
# 25 1 1 1 1
# 50 2 2 1 1
# 75 3 1 2 1
# 97 1 1 1 1
# 100 1 2 2 3
答案 2 :(得分:0)
您可以尝试:
un1 <- as.character(sort(unique(unlist(a, use.names=FALSE))))
data.frame(percent=un1,do.call(cbind,
lapply(a, function(x) table(factor(x, levels=un1)))))