获取列表中每个值的数据框中的行数,其中数据框中的每一行都有多个可包含该值的列

时间:2017-12-18 20:30:46

标签: r

所以我有一个值列表:

Value
AAA
BBB
CCC
.
.
.
ZZZ

现在我有一个数据框,其中每行有15列,可以包含这些值:

ID V1   V2   V3   V4   V5   V6   V7   V8   V9   V10   V11   V12   V13   V14  V15
1  AAA
2  AAA  BBB
3  CCC  BBB

基本上,我想从该数据框中为该列表中的每个值计算一行数,它可以显示在15列中的任何一列中:

期望的输出:

Value  Count
AAA     2
BBB     2
CCC     1
.
.
.
ZZZ     0

我尝试使用sapply并应用如下所示,但这似乎不起作用:

apply(mylist$values, 2, function(x) { length(which(df[,2:16] %in% x)) } )

sapply(mylist$values, function(x) { length(which(x %in% df[,2:16])) })

我很感激任何想法!

谢谢,

2 个答案:

答案 0 :(得分:1)

使用table

这样的事情
# Generate some sample data
set.seed(2017);
df <- as.data.frame(matrix(
    sapply(sample(LETTERS[1:5], 45, replace = T), function(x) paste(rep(x, 3), collapse = "")),
    ncol = 15));
df;
#   V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15
#1 EEE BBB AAA BBB AAA BBB EEE BBB BBB BBB EEE BBB AAA CCC AAA
#2 CCC DDD CCC DDD CCC BBB EEE EEE EEE AAA AAA EEE AAA CCC AAA
#3 CCC DDD CCC AAA CCC DDD DDD DDD DDD BBB BBB DDD AAA CCC EEE

# Your list of values 
Values <- list(sapply(LETTERS[1:6], function(x) paste(rep(x, 3), collapse = "")))
Values;
#[[1]]
#    A     B     C     D     E     F
#"AAA" "BBB" "CCC" "DDD" "EEE" "FFF"

# Summarise counts as table
table(factor(unlist(df), levels = unique(unlist(Values))));

# As dataframe
df.table <- as.data.frame(table(factor(unlist(df), levels = unique(unlist(Values)))));
df.table[order(as.character(df.table$Var1)), ];
#  Var1 Freq
#1  AAA   10
#2  BBB   10
#3  CCC    9
#4  DDD    8
#5  EEE    8
#6  FFF    0

请注意0的{​​{1}}计数,该计数不属于FFF,但显示在df

答案 1 :(得分:0)

这可能适合你

sapply(df1$Value, function(x) sum(df2 == x, na.rm=TRUE))
# AAA BBB CCC ZZZ 
  # 2   2   1   0

数据

df1 <- structure(list(Value = c("AAA", "BBB", "CCC", "ZZZ")), .Names = "Value", class = "data.frame", row.names = c(NA, 
-4L))

df2 <- structure(list(ID = 1:3, V1 = c("AAA", "AAA", "CCC"), V2 = c(NA, 
"BBB", "BBB"), V3 = c(NA, NA, NA), V4 = c(NA, NA, NA), V5 = c(NA, 
NA, NA), V6 = c(NA, NA, NA), V7 = c(NA, NA, NA), V8 = c(NA, NA, 
NA), V9 = c(NA, NA, NA), V10 = c(NA, NA, NA), V11 = c(NA, NA, 
NA), V12 = c(NA, NA, NA), V13 = c(NA, NA, NA), V14 = c(NA, NA, 
NA), V15 = c(NA, NA, NA)), class = "data.frame", .Names = c("ID", 
"V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", 
"V11", "V12", "V13", "V14", "V15"), row.names = c(NA, -3L))