我有一个样本数据集,如下所示:
a <- structure(list(Occ = c(1, 2, 3, 4, 4, 5, 6, 4, 8, 5),
Type = c("A", "B", "C", "A", "A", "A", "B", "C", "C", "B"),
Alc = c("A", "B", "N", "A", "N", "N", "N", "A", "B", "B")),
.Names = c("Occ", "Type", "Alc"), row.names = c(NA, -10L), class = "data.frame")
a
Occ Type Alc
1 1 A A
2 2 B B
3 3 C N
4 4 A A
5 4 A N
6 5 A N
7 6 B N
8 4 C A
9 8 C B
10 5 B B
我使用lapply
查找每个变量中的类别计数。
lapply(a, table)
$Occ
1 2 3 4 5 6 8
1 1 1 3 2 1 1
$Type
A B C
4 3 3
$Alc
A B N
3 3 4
我希望以dataframe
格式获取百分比,如下所示:
Occ
1: 10%
2: 10%
3: 10%
4: 30%
5: 20%
6: 10%
8: 10%
Type
A: 40%
B: 30%
C: 30%
Alc
A: 30%
B: 30%
C: 40%
答案 0 :(得分:0)
创建数据框以保持输出在某些方面存在问题。作为一列,标题会创建一个令人困惑的变量,其中包含标题和无序信息以及可能的重复行名称。其次,行名称没有意义,因为它们将取决于每列的唯一值。第三,作为具有多列的数据框,将创建NA,并且行名称仍然不匹配。
列表最有意义:
lst <- lapply(a, function(x) {
tbl <- prop.table(table(x))
res <- paste0(round(tbl*100,2), "%")
names(res) <- names(tbl)
res
}
)
lst
# $Occ
# 1 2 3 4 5 6 8
# "10%" "10%" "10%" "30%" "20%" "10%" "10%"
#
# $Type
# A B C
# "40%" "30%" "30%"
#
# $Alc
# A B N
# "30%" "30%" "40%"