Question

我试图从数据帧（df1）中提取分组索引值，该数据帧表示分组时间范围（开始 - 结束），并且包含在另一个数据帧（df2）中给出的分组时间。我需要的输出是df3。

    library(tidyverse)
df1 %>% 
    select(from = start, to = end) %>% 
    pmap(seq) %>% 
    do.call(cbind, .) %>% 
    list(.) %>%
    mutate(df2, new = ., 
                ind = map2(time, new, ~ which(.x == .y, arr.ind = TRUE)[,2])) %>%
    select(-new)

我发布的上一个相关问题是针对未分组数据的整洁管道解决方案：

byte[] decodedString = Base64.decode(value, Base64.DEFAULT);
Bitmap bitmap = BitmapFactory.decodeByteArray(decodedString, 0,decodedString.length);
dashboard_img.setImageBitmap(bitmap);
dashboard_img.invalidate();

是否可以修改为df1和df2中的'group'列分组以给出输出df3？

Answer 1

使用group_by，我们可以nest然后进行加入

library(tidyverse)
df1 %>% 
  group_by(group) %>%
  nest(-group)  %>%
  mutate(new = map(data, ~.x %>% 
  select(from = start, to = end) %>%
  pmap(seq) %>% 
  do.call(cbind, .) %>% 
  list(.))) %>%
  right_join(df2) %>%
  mutate(ind = map2_int(time, new, ~ which(.x == .y[[1]], arr.ind = TRUE)[,2]),
          ind = map2_dbl(ind, data, ~ .y$index[.x])) %>%
  select(time, ind)
# A tibble: 6 x 2
#   time   ind
#  <dbl> <dbl>
#1 11.0   2.00
#2 17.0   7.00
#3 24.0   8.00
#4  5.00  9.00
#5  5.00  1.00
#6 22.0  12.0

Answer 2

这是data.table，

的好处

df1<-data.table(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),index=c(1,2,3,4,5,6,7,8,9,10,11,12),start=c(5,10,15,20,5,10,15,20,5,10,15,20),end=c(10,15,20,25,10,15,20,25,10,15,20,25))
df2<-data.table(group = c("A","B","B","C","A","C"),time=c(11,17,24,5,5,22))


df1[df2,on=.(group,start<=time,end>=time)][,c("start","index")]


   start index
1:    11     2
2:    17     7
3:    24     8
4:     5     9
5:     5     1
6:    22    12

然后您可以将开始列重命名为时间，我认为您得到了答案。

使用tidyverse根据来自另一个数据帧的分组值范围从数据框中提取分组值

2 个答案: