从数据框中过滤不同的元素

时间:2019-08-01 15:40:11

标签: r dataframe

我用以下列创建了一个数据框:

 my_dataframe
                                  elements          feature start_position_genes end_position_genes feature_strand  insideFeature distancetoFeature
1             RTE-8_NV_RTE_Nematostella6_1 OCBIM_22028242mg                70819              71389              +       upstream             -1696
2                              L1-6_BF1_58 OCBIM_22008718mg               509741             511293              +         inside               492
3          Dong-1_NVe_R4_Nematostella64_72 OCBIM_22009482mg                 9836              10156              +     downstream              1198
4                           RTE-8_BF15_186 OCBIM_22009511mg               682894             685144              -       upstream              -387
5                             RTE-8_BF8_11 OCBIM_22010371mg               328356             328506              -     downstream              1527
6                             RTE-8_BF6_88 OCBIM_22010371mg               328356             328506              -     downstream              1429
7           RTE-8_NV_RTE_Nematostella6_216 OCBIM_22010375mg               460848             461012              +       upstream             -1503
8           RTE-8_NV_RTE_Nematostella6_286 OCBIM_22018216mg              1919560            1925331              +       upstream             -1238
9  Penelope-6_NV_Penelope_Nematostella7_23 OCBIM_22021684mg               648631             648663              +       upstream              -694
10             RTE-1_AC_1_RTE_Anolis10_359 OCBIM_22028126mg               294912             295182              +         inside             -1241
11                            L1-6_BF1_243 OCBIM_22028914mg              1916979            1920114              +       upstream             -1697
12         RTE-8_NV_RTE_Nematostella11_361 OCBIM_22028921mg              2054223            2054408              +         inside               756
13          RTE-8_NV_RTE_Nematostella6_542 OCBIM_22036628mg              1542179            1542316              +         inside             -1512
14                           RTE-8_BF6_240 OCBIM_22036636mg              1660855            1660907              -         inside              -620

我要重新创建一个新的数据框,该数据框的列insideFeature中只有“内部”元素。 例如,我通常使用

grep("downstream", my_dataframe$insideFeature)

但是不可能同时选择两个元素(我不能做

grep(c("downstream", "upstream"), my_dataframe$insideFeature)

有没有可能获得像这样的数据帧的解决方案?

my_dataframe_filtered
                                      elements          feature start_position_genes end_position_genes feature_strand  insideFeature distancetoFeature
                               L1-6_BF1_58 OCBIM_22008718mg               509741             511293              +         inside               492
  4            RTE-1_AC_1_RTE_Anolis10_359 OCBIM_22028126mg               294912             295182              +         inside             -1241
 11         RTE-8_NV_RTE_Nematostella11_361 OCBIM_22028921mg              2054223            2054408              +         inside               756
 13          RTE-8_NV_RTE_Nematostella6_542 OCBIM_22036628mg              1542179            1542316              +         inside             -1512
 14                           RTE-8_BF6_240 OCBIM_22036636mg              1660855            1660907              -         inside              -620

1 个答案:

答案 0 :(得分:0)

如果固定匹配,我们可以使用%in%来匹配'insideFeature'列中的多个元素

subset(my_dataframe, insideFeature %in% c("downstream", "upstream"))

如果它是部分匹配项,则将grep|一起使用

subset(my_dataframe, grepl("downstream|upstream"), insideFeature))