Question

我有

 household       person     time           mode
      1           1          07:45:00        non-car
      1           1          09:05:00         car
      1           2          08:10:00         non-car
      1           3          22:45:00       non-car
      1           4          08:30:00         car
      1           5          22:00:00         car
      2           1          07:45:00        non-car
      2           2          16:45:00       car

我想找到一列，以了解每个家庭中非汽车模式是否最多比汽车模式早1小时。

我需要将该列作为这次与另一个交集的一个人的索引。

在上面的示例第一家庭中，第一人称的时间比第4人早1个小时，因此在新列4中，第一人称第一名婴儿和第四名第一名婴儿。输出：

 household       person     time           mode             overlap
      1           1          07:45:00        non-car           4
      1           1          09:05:00        car               2
      1           2          08:10:00        non-car           4,1
      1           3          22:45:00        non-car           0
      1           4          08:30:00        car               1,2
      1           5          22:00:00         car              0
      2           1          07:45:00        non-car            0
      2           2          16:45:00        car                0

与其他家庭成员的交集不为0或类似NA

Answer 1

这是一种dplyr方法，可以产生这些匹配项。

library(dplyr); library(hms)
df %>%
  # Connect the table to itself, linking by household.
  #   So every row gets linked to every row (including itself)
  #   with the same household. The original data with end .x and 
  #   the joined data will end .y, so we can compare then below.
  left_join(df, by = c("household")) %>%
  # Find the difference in time, in seconds
  mutate(time_dif = abs(time.y - time.x)) %>%
  filter(time_dif < 3600,       # Keep if <1hr difference
         person.x != person.y,  # Keep if different person
         mode.x != mode.y) %>%  # Keep if different mode

  # We have the answers now, everything below is for formatting

  # Rename and hide some variables we don't need any more
  select(household, person = person.x, time = time.x, 
         mode = mode.x, other = person.y) %>%
  # Combine each person's overlaps into one row
  group_by(household, person, time) %>%
  summarise(overlaps  = paste(other, collapse =","), times = length(other)) %>%
  # Add back all original rows, even if no overlaps
  right_join(df) %>%
  ungroup()


## A tibble: 7 x 6
#  household person time   overlaps times mode   
#      <int>  <int> <time> <chr>    <int> <chr>  
#1         1      1 07:45  4            1 non-car
#2         1      1 09:05  2            1 car    
#3         1      2 08:10  1,4          2 non-car
#4         1      3 22:45  NA          NA non-car
#5         1      4 08:30  1,2          2 car    
#6         2      1 07:45  NA          NA non-car
#7         2      2 16:45  NA          NA car

确定汽车和非汽车模式之间最多1小时的时差

1 个答案: