我正在使用R而我正在尝试根据某些约束从数据框中删除某些行。所以,如果我有了
dat <- data.frame(Cs=c("c1","c2","c3","c4","c5","c6"),
R1=sample(c("Y","N"),6,replace=TRUE), R2=sample(c("Y","N"),6,replace=TRUE),
R3=sample(c("Y","N"),6,replace=TRUE), R4=sample(c("Y","N"),6,replace=TRUE),
R5=sample(c("Y","N"),6,replace=TRUE), R6=sample(c("Y","N"),6,replace=TRUE))
我想删除所有包含&#34; N&#34;的行。在一些给定的列,如R1,R3,R4。对于一个列,我找到了这个解决方案:delete row for certain constrains
d <- dat[dat[,"R1"]!="N",]
工作正常。但如果我把多列作为
d <- dat[dat[,c("R1","R3","R4")]!="N",]
我有很多额外的行充满了NA。那我错在哪里?
答案 0 :(得分:1)
您可以使用
dat[rowSums(dat[, c("R1","R3","R4")] == "N") == 0, , drop=FALSE]
# Cs R1 R2 R3 R4 R5 R6
#5 c5 Y Y Y Y Y Y
或者,如果您不喜欢过度打字:
dat[!rowSums(dat[c('R1','R3','R4')]=='N'),]
这将首先测试数据列“R1”,“R3”和“R4”的每个“单元格”是否等于“N”,然后计算每行的TRUE值之和。如果一行中不存在“N”,则总和等于0并将保留。我添加了drop=FALSE
以将结构保持为data.frame
。
在OP发表评论后注意:
如果仅对data.frame
的1列进行子集而未指定drop=TRUE
选项,则[.data.frame
的默认行为是将生成的1列数据框强制转换为原子向量。然后,rowSums
将不适用于该结果向量。为避免这种情况,请将代码更改为:
dat[!rowSums(dat[,'R1', drop=FALSE]=='N'), ]
示例数据:
set.seed(5)
dat <- data.frame(Cs=c("c1","c2","c3","c4","c5","c6"),
R1=sample(c("Y","N"),6,replace=TRUE), R2=sample(c("Y","N"),6,replace=TRUE),
R3=sample(c("Y","N"),6,replace=TRUE), R4=sample(c("Y","N"),6,replace=TRUE),
R5=sample(c("Y","N"),6,replace=TRUE), R6=sample(c("Y","N"),6,replace=TRUE))
答案 1 :(得分:0)
你可以为每一行制作一个由“布尔”组成的“保持”变量:
keep <- apply(dat[,c("R1","R3","R4")],
MARGIN=1,
FUN=function(x){all(x!='N')})
res <- dat[keep,]
> res
Cs R1 R2 R3 R4 R5 R6
1 c1 Y Y Y Y Y Y
数据: 种子使用:1234
dat <- structure(list(Cs = structure(1:6, .Label = c("c1", "c2", "c3",
"c4", "c5", "c6"), class = "factor"), R1 = structure(c(2L, 1L,
1L, 1L, 1L, 1L), .Label = c("N", "Y"), class = "factor"), R2 = structure(c(2L,
2L, 1L, 1L, 1L, 1L), .Label = c("N", "Y"), class = "factor"),
R3 = structure(c(2L, 1L, 2L, 1L, 2L, 2L), .Label = c("N",
"Y"), class = "factor"), R4 = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = "Y", class = "factor"), R5 = structure(c(2L,
1L, 1L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"),
R6 = structure(c(2L, 2L, 2L, 1L, 2L, 1L), .Label = c("N",
"Y"), class = "factor")), .Names = c("Cs", "R1", "R2", "R3",
"R4", "R5", "R6"), row.names = c(NA, -6L), class = "data.frame")