如何使用向量对数据帧列表进行子集

时间:2020-10-04 22:19:10

标签: r dataframe tidyverse purrr

我刚刚开始使用R中的列表,但是我有一个我似乎无法解决的问题,因为我不知道如何对列表进行子集或过滤。 我有一个包含9个数据帧的列表

List of 9
 $ :'data.frame':   41999 obs. of  2 variables:
  ..$ XY_ID : chr [1:41999] "193.722:175.733" "192.895:176.727" "187.065:178.285" "190.754:178.186" ...
  ..$ CellID: int [1:41999] 0 0 7 0 0 7 8 8 7 8 ...
 $ :'data.frame':   42069 obs. of  2 variables:
  ..$ XY_ID : chr [1:42069] "192.895:176.727" "187.065:178.285" "190.754:178.186" "192.296:178.648" ...
  ..$ CellID: int [1:42069] 0 7 7 0 8 7 8 8 7 8 ...
 $ :'data.frame':   42116 obs. of  2 variables:
  ..$ XY_ID : chr [1:42116] "192.296:178.648" "178.899:180.92" "182.416:181.265" "186.806:181.434" ...
  ..$ CellID: int [1:42116] 0 8 8 7 7 8 7 8 7 7 ...
 $ :'data.frame':   41976 obs. of  2 variables:
  ..$ XY_ID : chr [1:41976] "193.722:175.733" "190.654:176.113" "188.362:176.407" "192.895:176.727" ...
  ..$ CellID: int [1:41976] 0 7 7 0 7 7 7 7 8 7 ...
 $ :'data.frame':   41949 obs. of  2 variables:
  ..$ XY_ID : chr [1:41949] "190.654:176.113" "188.362:176.407" "192.895:176.727" "186.064:177.413" ...
  ..$ CellID: int [1:41949] 0 0 0 7 7 0 0 7 7 8 ...
 $ :'data.frame':   42020 obs. of  2 variables:
  ..$ XY_ID : chr [1:42020] "190.754:178.186" "192.296:178.648" "189.421:179.012" "186.453:179.2" ...
  ..$ CellID: int [1:42020] 0 0 0 7 7 7 7 7 8 8 ...
 $ :'data.frame':   41902 obs. of  2 variables:
  ..$ XY_ID : chr [1:41902] "191.802:173.732" "193.722:175.733" "183.882:176.123" "190.654:176.113" ...
  ..$ CellID: int [1:41902] 0 0 7 0 0 0 8 7 7 0 ...
 $ :'data.frame':   42072 obs. of  2 variables:
  ..$ XY_ID : chr [1:42072] "190.754:178.186" "192.296:178.648" "189.421:179.012" "178.899:180.92" ...
  ..$ CellID: int [1:42072] 0 0 0 8 7 8 7 7 8 7 ...
 $ :'data.frame':   41956 obs. of  2 variables:
  ..$ XY_ID : chr [1:41956] "193.722:175.733" "190.654:176.113" "188.362:176.407" "192.895:176.727" ...
  ..$ CellID: int [1:41956] 0 0 7 0 8 7 7 7 7 8 ...

第一列为XY_ID,第二列为CellID。 我还有一个向量,其中所有9个数据帧(第1列)都具有YX_ID。我这样提取了这些常见的XY_ID:

csv4 <- Reduce(intersect, lapply(csv3, function(x){
    x[['XY_ID']]
}))

str(csv4)
 chr [1:35368] "192.296:178.648" "182.416:181.265" "186.806:181.434" "188.737:181.429" ...

现在我要对csv4(矢量)中的每个XY_ID进行查找,然后在csv3中的每个数据场的第1列(XY_ID)中找到匹配项,并打印出与之对应的CellID。

输出应如下所示:

enter image description here

1 个答案:

答案 0 :(得分:1)

我们可以使用list遍历subsetrbind数据集和do.call数据集

out1 <- do.call(rbind, lapply(csv3, function(x) subset(x, XY_ID %in% csv4)))

或者另一个选择是map

library(dplyr)
library(purrr)
out2 <- map_dfr(csv3, ~ .x  %>%
                          filter(XY_ID %in% csv4))