我有一个数据集,其结构如下所示。
dat <- data.frame(
event = c("A", "A", "A", "B", "B", "B", "B", "C", "C", "C"),
place =c("p1", "p1", "p2", "p3", "p3", "p3", "p4","p4","p4", "p5"),
day = c("May 1","May 2","May 3", "May 4", "May 5", "May 6", "May 7", "May 8",
"May 9", "May 1"),
visits = c(2,1,4,1,2,4,8,2,3,1))
我想针对每个事件,确定在2天(或至少2天)内的访问量最高的地方。
我想要的结果:
event place visits
A p1 3
B p3 7
C p4 5
答案 0 :(得分:1)
如果我正确理解了您的问题,那么您会执行以下操作:
library(tidyverse)
dat %>%
group_by(event, place) %>%
summarise(different_days = n_distinct(day),
visits = sum(visits)) %>%
filter(different_days >= 2) %>%
select(-different_days) # Only to match desired result exactly
生产
# A tibble: 3 x 3
# Groups: event [3]
event place visits
<fct> <fct> <dbl>
1 A p1 3
2 B p3 7
3 C p4 5
答案 1 :(得分:0)
另一种方法是首先为每个组添加计数,选择计数大于1的组,然后选择最大event
的{{1}}。
visits
答案 2 :(得分:0)
您可以在aggregate()
和"event"
之上"place"
并将length()
和sum()
用作FUN
的参数。
(a <- do.call(cbind.data.frame,
aggregate(visits ~ event + place, dat, FUN=function(a) c(length(a), sum(a)))))
# event place visits.1 visits.2
# 1 A p1 2 3
# 2 A p2 1 4
# 3 B p3 3 7
# 4 B p4 1 8
# 5 C p4 2 5
# 6 C p5 1 1
子集为您提供所需的内容:
a[a[3] > 1, -3]
# event place visits.2
# 1 A p1 3
# 3 B p3 7
# 5 C p4 5