Question

我有一些数据可以查看一群人及其随时间吃的水果。我想用dplyr来看每个人，直到他们吃香蕉并总结他们吃的所有水果，直到他们吃掉他们的第一个香蕉。

数据：

data <-  structure(list(user = c(1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 
    1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 9584L, 9584L, 9584L, 
    9584L, 9584L, 9584L, 9584L, 9584L, 9584L, 4758L, 4758L, 4758L, 
    4758L, 4758L, 4758L), site = structure(c(1L, 6L, 1L, 1L, 6L, 
    5L, 5L, 3L, 4L, 1L, 2L, 6L, 1L, 6L, 5L, 5L, 3L, 2L, 6L, 6L, 6L, 
    4L, 2L, 5L, 5L, 4L, 2L), .Label = c("apple", "banana", "lemon", 
    "lime", "orange", "pear"), class = "factor"), time = c(1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L, 9L, 5L, 6L, 7L, 8L, 9L, 10L), int = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
    1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L), .Label = c("banana", 
    "other"), class = "factor")), .Names = c("user", "site", "time", 
    "int"), row.names = c(NA, -27L), class = "data.frame")

我最初的想法是将数据分组以找到每个用户吃香蕉的第一个实例：

data <- data %>% transform(var = ifelse(site=="banana", 'banana','other'))

data_ban <- data %>% 
    filter(var=='banana') %>% 
    group_by(user, var, time) %>%
    group_by(user) %>%
    summarise(first_banana = min(time))

但现在我仍然坚持如何将其实际应用于原始数据＆＃34;数据＆＃34;数据框，并设置一个过滤器，其中说明：对于每个用户，只包括数据直到＆＃34; data_ban＆＃34;中给出的时间。有任何想法吗？

Answer 1

您可以尝试slice

data %>%
     group_by(user) %>% 
     slice(1:(which(int=='banana')[1L]))

Answer 2

这样的事情：按user分组并过滤time低于他们第一次吃香蕉。

> data %>% group_by(user) %>% filter( time <= which(site=="banana")[1] )
Source: local data frame [17 x 4]
Groups: user

   user   site time    int
1  1234  apple    1  other
2  1234   pear    2  other
3  1234  apple    3  other
4  1234  apple    4  other
5  1234   pear    5  other
6  1234 orange    6  other
7  1234 orange    7  other
8  1234  lemon    8  other
9  1234   lime    9  other
10 1234  apple   10  other
11 1234 banana   11 banana
12 9584  apple    1  other
13 9584   pear    2  other
14 9584 orange    3  other
15 9584 orange    4  other
16 9584  lemon    5  other
17 9584 banana    6 banana

否则可能是anti_join。

r + dplyr过滤掉时间序列

2 个答案: