如何使用涉及两个因子变量的天和值的多个条件来子集数据框

时间:2019-11-03 21:12:33

标签: r

我有一个数据集,其结构如下所示。

dat <- data.frame(
   event = c("A", "A", "A", "B", "B", "B", "B", "C", "C", "C"), 
   place =c("p1", "p1", "p2", "p3", "p3", "p3", "p4","p4","p4", "p5"), 
   day = c("May 1","May 2","May 3", "May 4", "May 5", "May 6", "May 7", "May 8", 
           "May 9", "May 1"),     
   visits = c(2,1,4,1,2,4,8,2,3,1))

我想针对每个事件,确定在2天(或至少2天)内的访问量最高的地方。

我想要的结果:

event place visits
A     p1     3
B     p3     7
C     p4     5

3 个答案:

答案 0 :(得分:1)

如果我正确理解了您的问题,那么您会执行以下操作:

library(tidyverse)

dat %>% 
  group_by(event, place) %>% 
  summarise(different_days = n_distinct(day), 
            visits = sum(visits)) %>% 
  filter(different_days >= 2) %>%
  select(-different_days) # Only to match desired result exactly

生产

# A tibble: 3 x 3
# Groups:   event [3]
  event place visits
  <fct> <fct>  <dbl>
1 A     p1         3
2 B     p3         7
3 C     p4         5

答案 1 :(得分:0)

另一种方法是首先为每个组添加计数,选择​​计数大于1的组,然后选择最大event的{​​{1}}。

visits

答案 2 :(得分:0)

您可以在aggregate()"event"之上"place"并将length()sum()用作FUN的参数。

(a <- do.call(cbind.data.frame, 
             aggregate(visits ~ event + place, dat, FUN=function(a) c(length(a), sum(a)))))

#   event place visits.1 visits.2
# 1     A    p1        2        3
# 2     A    p2        1        4
# 3     B    p3        3        7
# 4     B    p4        1        8
# 5     C    p4        2        5
# 6     C    p5        1        1

子集为您提供所需的内容:

a[a[3] > 1, -3]
#   event place visits.2
# 1     A    p1        3
# 3     B    p3        7
# 5     C    p4        5