Question

我有这样的文件。

"1" "4" "10" "ttts" 3
"2" "10" "22" "ttt" 2
"3" "10" "295" "00000" 13
"4" "10" "584" "0t000000" 5
"5" "10" "403" "000s" 15
"6" "10" "281" "000" 19
"7" "10" "123" "000q" 16
"8" "10" "127" "000" 20
........................

我想要的是，第四列中包含所有0的所有行，例如行3和行6以及行8都是淘汰。我怎么能在R中这样做？谢谢！

Answer 1

使用grep可能是最有效的方法：

data = read.table(header = TRUE, text = "  X2  X3       X4 X5
1  4  10     ttts  3
2 10  22      ttt  2
3 10 295    00000 13
4 10 584 0t000000  5
5 10 403     000s 15
6 10 281      000 19
7 10 123     000q 16
8 10 127      000 20")

data[!grepl("^0+$", data[,3]),]
#  X2  X3       X4 X5
#1  4  10     ttts  3
#2 10  22      ttt  2
#4 10 584 0t000000  5
#5 10 403     000s 15
#7 10 123     000q 16

修改：根据评论者的建议将grep更改为grepl。

Answer 2

我认为第8行也应该被删除。

我建议尝试＆＃34; stringi＆＃34;打包并做这样的事情：

library(stringi)
stri_count_fixed(mydf[, 4], "0") == nchar(mydf[, 4])
# [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE

您可以将此逻辑向量用于原始数据集的子集。

在基地R中，您也可以尝试：

vapply(strsplit(mydf[, 4], ""), function(x) all(x == "0"), logical(1L))
# [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE

Answer 3

另一种方式是：

indx <- as.numeric(as.character(data[,4])) #all the non-numeric elements coerced to NA

 data[!(!is.na(indx) & !indx),]
#   V1 V2  V3       V4 V5
# 1  1  4  10     ttts  3
# 2  2 10  22      ttt  2
# 4  4 10 584 0t000000  5
# 5  5 10 403     000s 15
# 7  7 10 123     000q 16

解释

使用包含其他0

的数字的更一般的示例

v1 <- c("ttts", "ttt", "00000", "0t000000", "000s", "000", "000q", 
"000", "001")
indx <-suppressWarnings(as.numeric(v1)) #coerce non-numeric elements to NA
indx
#[1] NA NA  0 NA NA  0 NA  0  1

从其余

中排除所有0个元素

indx1 <- !is.na(indx) & !indx #elements that are all 0's are TRUE
indx1
#[1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE

否定

!(indx1)
#[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

v1[!(indx1)]
#[1] "ttts"     "ttt"      "0t000000" "000s"     "000q"     "001"

删除包含r中所有0的行

3 个答案:

解释