根据某些约束删除多行

时间:2015-11-27 16:28:49

标签: r dataframe filtering delete-row

我正在使用R而我正在尝试根据某些约束从数据框中删除某些行。所以,如果我有了

dat <- data.frame(Cs=c("c1","c2","c3","c4","c5","c6"),  
  R1=sample(c("Y","N"),6,replace=TRUE), R2=sample(c("Y","N"),6,replace=TRUE),
  R3=sample(c("Y","N"),6,replace=TRUE), R4=sample(c("Y","N"),6,replace=TRUE),
  R5=sample(c("Y","N"),6,replace=TRUE), R6=sample(c("Y","N"),6,replace=TRUE))

我想删除所有包含&#34; N&#34;的行。在一些给定的列,如R1,R3,R4。对于一个列,我找到了这个解决方案:delete row for certain constrains

d <- dat[dat[,"R1"]!="N",]

工作正常。但如果我把多列作为

d <- dat[dat[,c("R1","R3","R4")]!="N",]

我有很多额外的行充满了NA。那我错在哪里?

2 个答案:

答案 0 :(得分:1)

您可以使用

dat[rowSums(dat[, c("R1","R3","R4")] == "N") == 0, , drop=FALSE]
#  Cs R1 R2 R3 R4 R5 R6
#5 c5  Y  Y  Y  Y  Y  Y

或者,如果您不喜欢过度打字:

dat[!rowSums(dat[c('R1','R3','R4')]=='N'),]

这将首先测试数据列“R1”,“R3”和“R4”的每个“单元格”是否等于“N”,然后计算每行的TRUE值之和。如果一行中不存在“N”,则总和等于0并将保留。我添加了drop=FALSE以将结构保持为data.frame

在OP发表评论后注意:

如果仅对data.frame的1列进行子集而未指定drop=TRUE选项,则[.data.frame的默认行为是将生成的1列数据框强制转换为原子向量。然后,rowSums将不适用于该结果向量。为避免这种情况,请将代码更改为:

dat[!rowSums(dat[,'R1', drop=FALSE]=='N'), ] 

示例数据:

set.seed(5) 
dat <- data.frame(Cs=c("c1","c2","c3","c4","c5","c6"),  
                  R1=sample(c("Y","N"),6,replace=TRUE), R2=sample(c("Y","N"),6,replace=TRUE),
                  R3=sample(c("Y","N"),6,replace=TRUE), R4=sample(c("Y","N"),6,replace=TRUE),
                  R5=sample(c("Y","N"),6,replace=TRUE), R6=sample(c("Y","N"),6,replace=TRUE))

答案 1 :(得分:0)

你可以为每一行制作一个由“布尔”组成的“保持”变量:

keep <- apply(dat[,c("R1","R3","R4")],
                  MARGIN=1,
                  FUN=function(x){all(x!='N')})
res <- dat[keep,]

> res
  Cs R1 R2 R3 R4 R5 R6
1 c1  Y  Y  Y  Y  Y  Y

数据: 种子使用:1234

dat <- structure(list(Cs = structure(1:6, .Label = c("c1", "c2", "c3", 
"c4", "c5", "c6"), class = "factor"), R1 = structure(c(2L, 1L, 
1L, 1L, 1L, 1L), .Label = c("N", "Y"), class = "factor"), R2 = structure(c(2L, 
2L, 1L, 1L, 1L, 1L), .Label = c("N", "Y"), class = "factor"), 
    R3 = structure(c(2L, 1L, 2L, 1L, 2L, 2L), .Label = c("N", 
    "Y"), class = "factor"), R4 = structure(c(1L, 1L, 1L, 1L, 
    1L, 1L), .Label = "Y", class = "factor"), R5 = structure(c(2L, 
    1L, 1L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"), 
    R6 = structure(c(2L, 2L, 2L, 1L, 2L, 1L), .Label = c("N", 
    "Y"), class = "factor")), .Names = c("Cs", "R1", "R2", "R3", 
"R4", "R5", "R6"), row.names = c(NA, -6L), class = "data.frame")