我有一个数据框dt
,其中有数千个重复事件可能只在一个位置或在两个位置都发生过。我如何计算/计算仅在两个位置都发生的事件数。例如,在下面的示例dt
中,我们可以看到2
事件(ev2
和ev3
)在较高和较低位置均发生,因此计数为2。>
dt<-structure(list(event = c("ev1", "ev1", "ev2", "ev2", "ev2", "ev2",
"ev2", "ev3", "ev3", "ev3", "ev3", "ev3", "ev3", "ev3", "ev3",
"ev3", "ev3", "ev3", "ev3", "ev6", "ev6", "ev6", "ev6", "ev6",
"ev8", "ev8", "ev8", "ev11", "ev11", "ev17"), location = c("Lower",
"Lower", "Lower", "Lower", "Higher", "Higher", "Higher", "Lower",
"Higher", "Higher", "Lower", "Lower", "Lower", "Lower", "Lower",
"Lower", "Lower", "Lower", "Lower", "Lower", "Lower", "Lower",
"Lower", "Lower", "Higher", "Higher", "Higher", "Lower", "Lower",
"Lower")), .Names = c("event", "location"), row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(
cols = structure(list(event = structure(list(), class = c("collector_character",
"collector")), location = structure(list(), class = c("collector_character",
"collector"))), .Names = c("event", "location")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
答案 0 :(得分:1)
我们可以找出event
上出现的location
library(dplyr)
dt %>%
group_by(event) %>%
filter(all(c("Lower", "Higher") %in% location)) %>%
pull(event) %>% unique()
#[1] "ev2" "ev3"
或者如果您想要计数
dt %>%
group_by(event) %>%
filter(all(c("Lower", "Higher") %in% location)) %>%
pull(event) %>% n_distinct()
#[1] 2
在基数R中,我们可以使用aggregate
df1 <- aggregate(location~event, dt, function(x) all(c("Lower", "Higher") %in% x))
df1$event[df1$location]
#[1] "ev2" "ev3"
length(df1$event[df1$location])
#[1] 2
答案 1 :(得分:1)
Ronak的方法更健壮,大声笑,但您也可以删除同时重复的行,然后在事件列中查找重复项:
temp_df <- dt[!duplicated(dt[c("event","location")]),]
sum(duplicated(temp_df$event))
[1] 2
答案 2 :(得分:0)
您还可以将唯一的行粘贴在一起,并使用regexpr
计算重复的前缀。
sum(table(regmatches(v <- unique(apply(dt, 1, paste, collapse="")), regexpr("\\d+", v))) > 1)
# [1] 2
答案 3 :(得分:0)
我们可以使用data.table
library(data.table)
nrow(setDT(dt)[, .GRP[sum(c("Lower", "Higher") %in% location) == 2], event])
#[1] 2
或与dplyr
library(dplyr)
dt %>%
filter(location %in% c("Lower", "Higher")) %>%
distinct %>%
count(event) %>%
filter(n == 2) %>%
nrow
#[1] 2
或使用base R
sum(rowSums(table(unique(dt))) == 2)
#[1] 2