以下是一些数据:
dta <- data.frame(
id = 1:10,
code1 = as.factor(sample(c("male", "female"), 10, replace = TRUE)),
code2 = as.factor(sample(c("yes", "no", "maybe"), 10, replace = TRUE)),
code3 = as.factor(sample(c("yes", "no"), 10, replace = TRUE))
)
我想要一个格式良好的代码变量频率表。
codes <- c("code1", "code2", "code3")
例如,我们可以运行内置命令table
。
> sapply(dta[, codes], table)
$code1
female male
4 6
$code2
maybe no yes
5 2 3
$code3
no yes
4 6
所有的信息都在这里,但最好是有一张桌子:
library(plyr)
ddply(dta, .(code1), summarize, n1 = length(code1))
code1 n1
1 female 4
2 male 6
这三次。可以是单独的数据帧,也可以是一个。
我们如何循环变量?或任何其他方法。
答案 0 :(得分:1)
您可以将lapply
与as.data.frame(table)
codes <- c("code1", "code2", "code3")
tbl<-lapply(dta[, codes], as.data.frame(table))
哪个会给你:
tbl
$code1
value.Var1 value.Freq
1 female 6
2 male 4
$code2
value.Var1 value.Freq
1 maybe 4
2 no 5
3 yes 1
$code3
value.Var1 value.Freq
1 no 4
2 yes 6
因此,您可以使用tbl$code1
,tbl$code2
等访问每个数据框。例如:
tbl$code1
value.Var1 value.Freq
1 female 6
2 male 4
答案 1 :(得分:1)
这项工作(通用,因为它不需要先知道“代码”)?
library(plyr)
library(reshape)
dta <- data.frame(
id = 1:10,
code1 = as.factor(sample(c("male", "female"), 10, replace = TRUE)),
code2 = as.factor(sample(c("yes", "no", "maybe"), 10, replace = TRUE)),
code3 = as.factor(sample(c("yes", "no"), 10, replace = TRUE))
)
d1 <- melt(dta, "id")
d2 <- count(d1, .(variable, value))
d3 <- by(d2, d2$variable, function(x) {
v <- as.character(x[1,]$variable)
y <- x[,2:3]
colnames(y) <- c(v, "n1")
return(y)
})
d3
## d2$variable: code1
## code1 n1
## 1 female 6
## 2 male 4
## -----------------------------------------------------------------------
## d2$variable: code2
## code2 n1
## 3 maybe 2
## 4 no 5
## 5 yes 3
## -----------------------------------------------------------------------
## d2$variable: code3
## code3 n1
## 6 no 4
## 7 yes 6