我正在尝试找到具有大多数用户给定值的ID
。下面分享了一个小数据集:
ID Val1 Val2 Time
1 A B 12:00
1 A C 13:10
1 C D 13:19
2 L O 14:00
2 A C 15:00
2 A M 15:00
3 P J 16:00
搜索向量:
Vc = c("A","B","C","I","T")
搜索向量可能同时出现在Val1
或Val2
中。我要找的结果是:
ID Match
1 3
2 2
答案 0 :(得分:1)
(假设:Vc
中的值是唯一的。)
使用data.table
:
library("data.table")
setDT(D)
D[, sum(Vc %in% c(Val1, Val2)), ID]
D[, sum(Vc %in% c(Val1, Val2)), ID][V1>0] # without zero counts
替代代码(但逻辑相同):
D[, sum(unique(c(Val1, Val2)) %in% Vc), ID][V1>0]
数据:
D <- read.table(header=TRUE, stringsAsFactors = FALSE, text=
"ID Val1 Val2 Time
1 A B 12:00
1 A C 13:10
1 C D 13:19
2 L O 14:00
2 A C 15:00
2 A M 15:00
3 P J 16:00")
Vc = c("A", "B", "C", "I", "T")
以下是data.table
的另一种解决方案:
library("data.table")
D <- fread(
"ID Val1 Val2 Time
1 A B 12:00
1 A C 13:10
1 C D 13:19
2 L O 14:00
2 A C 15:00
2 A M 15:00
3 P J 16:00")
Vc <- data.table(V1=c("A", "B", "C", "I", "T"))
D[, .(c(Val1, Val2), ID)][Vc, on="V1", length(unique(V1)), ID]
D[, .(c(Val1, Val2), ID)][Vc, on="V1", length(unique(V1)), ID, nomatch=0] # without the NA
答案 1 :(得分:0)
[]
答案 2 :(得分:0)
您还可以将数据帧转换为长格式并进行计算:
library(tidyverse)
df %>%
gather(k, v, Val1:Val2) %>%
distinct(ID, v) %>%
group_by(ID) %>%
summarize(Match = sum(v %in% Vc)) %>%
filter(Match > 0)
结果:
# A tibble: 2 x 2
ID Match
<int> <int>
1 1 3
2 2 2