所以我有一个值列表:
Value
AAA
BBB
CCC
.
.
.
ZZZ
现在我有一个数据框,其中每行有15列,可以包含这些值:
ID V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
1 AAA
2 AAA BBB
3 CCC BBB
基本上,我想从该数据框中为该列表中的每个值计算一行数,它可以显示在15列中的任何一列中:
期望的输出:
Value Count
AAA 2
BBB 2
CCC 1
.
.
.
ZZZ 0
我尝试使用sapply并应用如下所示,但这似乎不起作用:
apply(mylist$values, 2, function(x) { length(which(df[,2:16] %in% x)) } )
或
sapply(mylist$values, function(x) { length(which(x %in% df[,2:16])) })
我很感激任何想法!
谢谢,
答案 0 :(得分:1)
使用table
?
# Generate some sample data
set.seed(2017);
df <- as.data.frame(matrix(
sapply(sample(LETTERS[1:5], 45, replace = T), function(x) paste(rep(x, 3), collapse = "")),
ncol = 15));
df;
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
#1 EEE BBB AAA BBB AAA BBB EEE BBB BBB BBB EEE BBB AAA CCC AAA
#2 CCC DDD CCC DDD CCC BBB EEE EEE EEE AAA AAA EEE AAA CCC AAA
#3 CCC DDD CCC AAA CCC DDD DDD DDD DDD BBB BBB DDD AAA CCC EEE
# Your list of values
Values <- list(sapply(LETTERS[1:6], function(x) paste(rep(x, 3), collapse = "")))
Values;
#[[1]]
# A B C D E F
#"AAA" "BBB" "CCC" "DDD" "EEE" "FFF"
# Summarise counts as table
table(factor(unlist(df), levels = unique(unlist(Values))));
# As dataframe
df.table <- as.data.frame(table(factor(unlist(df), levels = unique(unlist(Values)))));
df.table[order(as.character(df.table$Var1)), ];
# Var1 Freq
#1 AAA 10
#2 BBB 10
#3 CCC 9
#4 DDD 8
#5 EEE 8
#6 FFF 0
请注意0
的{{1}}计数,该计数不属于FFF
,但显示在df
。
答案 1 :(得分:0)
这可能适合你
sapply(df1$Value, function(x) sum(df2 == x, na.rm=TRUE))
# AAA BBB CCC ZZZ
# 2 2 1 0
数据
df1 <- structure(list(Value = c("AAA", "BBB", "CCC", "ZZZ")), .Names = "Value", class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(ID = 1:3, V1 = c("AAA", "AAA", "CCC"), V2 = c(NA,
"BBB", "BBB"), V3 = c(NA, NA, NA), V4 = c(NA, NA, NA), V5 = c(NA,
NA, NA), V6 = c(NA, NA, NA), V7 = c(NA, NA, NA), V8 = c(NA, NA,
NA), V9 = c(NA, NA, NA), V10 = c(NA, NA, NA), V11 = c(NA, NA,
NA), V12 = c(NA, NA, NA), V13 = c(NA, NA, NA), V14 = c(NA, NA,
NA), V15 = c(NA, NA, NA)), class = "data.frame", .Names = c("ID",
"V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10",
"V11", "V12", "V13", "V14", "V15"), row.names = c(NA, -3L))