我有以下包含字符和数字以及NA的数据框:
df <- data.frame(a=c("notfound","NOT FOUND","NOT FOUND"), b=c(NA,"NOT FOUND","NOT FOUND"), c=c("not found",2,3), d=c("not found","NOT FOUND","NOT FOUND"), e=c("234","NOT FOUND",NA))
a b c d e 1 notfound <NA> not found not found 234 2 NOT FOUND NOT FOUND 2 NOT FOUND NOT FOUND 3 NOT FOUND NOT FOUND 3 NOT FOUND <NA>
我想删除所有条目“未找到”,“未找到”,“未找到”,“未找到”的所有列。基本上是tolower(gsub(" ","",df)=="notfound")
。似乎此操作不适用于数据帧。有其他选择吗?
所需的输出将是:
d e 1 not found 234 2 2 NOT FOUND 3 3 <NA>
答案 0 :(得分:2)
您可以将grepl
与正则表达式一起使用,以搜索与该表达式匹配的字符串,并仅保留某些元素未显示匹配项的列(由FALSE
grepl
表示) ),以使该列的匹配数小于nrow(df)
。此模式匹配以“ not”开头和以“ found”结尾的字符串,并且grepl
设置为不区分大小写。
is_nf <-
sapply(df, grepl, pattern = '(?=^not).*found$',
perl = TRUE, ignore.case = TRUE)
df[colSums(is_nf) < nrow(df)]
# b c e
# 1 <NA> not found 234
# 2 NOT FOUND 2 NOT FOUND
# 3 NOT FOUND 3 <NA>
我猜您也想删除唯一未找到“ NA”的列。
is_na <- is.na(df)
df[colSums(is_nf | is_na) < nrow(df)]
# c e
# 1 not found 234
# 2 2 NOT FOUND
# 3 3 <NA>