过滤R中的数据帧

时间:2016-08-26 11:37:06

标签: r dataframe filtering

我希望过滤表1中显示的数据框,使其看起来像表2,方法是删除类列中包含“Pathogenic”的任何行,并在验证列中删除0。虽然,我不确定应该使用哪种工具来实现这一目标。

Table1

Class               Validated
Pathogenic             1
Pathogenic             1
Pathogenic             0
Pathogenic             0
Likely Pathogenic      1
Likely Pathogenic      0
Likely Pathogenic      1
Uncertain              0
Uncertain              1


Table2

Class               Validated
Pathogenic             1
Pathogenic             1
Likely Pathogenic      1
Likely Pathogenic      0
Likely Pathogenic      1
Uncertain              0
Uncertain              1

2 个答案:

答案 0 :(得分:3)

假设“已验证”列的类型为数字:

table2 <- table1[!(table1$Class == "Pathogenic" & table1$Validated == 0),]

答案 1 :(得分:0)

基于评论中OP的澄清的一个选项是使用data.table

library(data.table)
setDT(Table1)[!(Class == "Pathogenic" & Validated == 0) ]
#               Class Validated
#1:        Pathogenic         1
#2:        Pathogenic         1
#3: Likely Pathogenic         1
#4: Likely Pathogenic         0
#5: Likely Pathogenic         1
#6:         Uncertain         0
#7:         Uncertain         1

或者在设置key

之后
setDT(Table1, key = c("Class", "Validated"))[!.("Pathogenic", 0)]
#                  Class Validated
#1: Likely Pathogenic         0
#2: Likely Pathogenic         1
#3: Likely Pathogenic         1
#4:        Pathogenic         1
#5:        Pathogenic         1
#6:         Uncertain         0
#7:         Uncertain         1

编辑:以前,我遵循不同的逻辑,因为OP的初始帖子是我希望过滤表1中显示的数据框,所以它看起来像表2.虽然,我不确定我应该使用哪个工具实现这一目标。

数据

df1 <- structure(list(Class = c("Pathogenic", "Pathogenic", "Pathogenic", 
 "Pathogenic", "Likely Pathogenic", "Likely Pathogenic", "Likely Pathogenic", 
"Uncertain", "Uncertain"), Validated = c(1, 1, 0, 0, 1, 0, 1, 
0, 1)), .Names = c("Class", "Validated"), row.names = c(NA, -9L
), class = "data.frame")