使用tidyverse根据来自另一个数据帧的分组值范围从数据框中提取分组值

时间:2018-03-22 07:26:31

标签: r tidyverse

我试图从数据帧(df1)中提取分组索引值,该数据帧表示分组时间范围(开始 - 结束),并且包含在另一个数据帧(df2)中给出的分组时间。我需要的输出是df3。

    library(tidyverse)
df1 %>% 
    select(from = start, to = end) %>% 
    pmap(seq) %>% 
    do.call(cbind, .) %>% 
    list(.) %>%
    mutate(df2, new = ., 
                ind = map2(time, new, ~ which(.x == .y, arr.ind = TRUE)[,2])) %>%
    select(-new)

我发布的上一个相关问题是针对未分组数据的整洁管道解决方案:

byte[] decodedString = Base64.decode(value, Base64.DEFAULT);
Bitmap bitmap = BitmapFactory.decodeByteArray(decodedString, 0,decodedString.length);
dashboard_img.setImageBitmap(bitmap);
dashboard_img.invalidate();

是否可以修改为df1和df2中的'group'列分组以给出输出df3?

2 个答案:

答案 0 :(得分:2)

使用group_by,我们可以nest然后进行加入

library(tidyverse)
df1 %>% 
  group_by(group) %>%
  nest(-group)  %>%
  mutate(new = map(data, ~.x %>% 
  select(from = start, to = end) %>%
  pmap(seq) %>% 
  do.call(cbind, .) %>% 
  list(.))) %>%
  right_join(df2) %>%
  mutate(ind = map2_int(time, new, ~ which(.x == .y[[1]], arr.ind = TRUE)[,2]),
          ind = map2_dbl(ind, data, ~ .y$index[.x])) %>%
  select(time, ind)
# A tibble: 6 x 2
#   time   ind
#  <dbl> <dbl>
#1 11.0   2.00
#2 17.0   7.00
#3 24.0   8.00
#4  5.00  9.00
#5  5.00  1.00
#6 22.0  12.0 

答案 1 :(得分:1)

这是data.table,

的好处
df1<-data.table(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),index=c(1,2,3,4,5,6,7,8,9,10,11,12),start=c(5,10,15,20,5,10,15,20,5,10,15,20),end=c(10,15,20,25,10,15,20,25,10,15,20,25))
df2<-data.table(group = c("A","B","B","C","A","C"),time=c(11,17,24,5,5,22))


df1[df2,on=.(group,start<=time,end>=time)][,c("start","index")]


   start index
1:    11     2
2:    17     7
3:    24     8
4:     5     9
5:     5     1
6:    22    12

然后您可以将开始列重命名为时间,我认为您得到了答案。