Question

我正在尝试整理stringr::str_extract_all的输出，以便删除所有空字符元素。

例如要从以下字符串中提取数字：

strings <- c("100 is 10 greater than 90", "1 in 10 people have 3 - 4 cats", "earth has 1 moon")

str_extract_all(strings, "\\d*")

这将返回答案，但是有很多空字符元素

# [[1]]
# [1] "100" ""    ""    ""    ""    "10"  ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    "90"  ""   
# 
# [[2]]
# [1] "1"  ""   ""   ""   ""   "10" ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   "3"  ""   ""   ""   "4"  ""   ""   ""   ""   ""   ""  
# 
# [[3]]
# [1] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "1" ""  ""  ""  ""  ""  ""

如何在保持其结构不变的情况下从此数据中删除""？即

# [[1]]
# [1] "100" "10" "90"   
# 
# [[2]]
# [1] "1"  "10"   "3"   "4"   
# 
# [[3]]
# [1] "1"

我按照str_extract_all(strings, "\\d*") %>% sapply(., "[!. == ""]")的方式做了一些尝试，但无法获得

Answer 1

您没有使用正确的正则表达式。尝试

str_extract_all(strings, "\\d+")
#[[1]]
#[1] "100" "10"  "90" 
#
#[[2]]
#[1] "1"  "10" "3"  "4" 
#
#[[3]]
#[1] "1"

仅使用base R的另一种方法

numbers <- gregexpr("\\d+", strings)
regmatches(strings, numbers)

这当然可以写成一行

regmatches(strings, gregexpr("\\d+", strings))

Answer 2

您可以尝试：

lapply(str_extract_all(strings, "\\d*"), function(x) x[!x %in% ""])

[[1]]
[1] "100" "10"  "90" 

[[2]]
[1] "1"  "10" "3"  "4" 

[[3]]
[1] "1"

或者：

lapply(str_extract_all(strings, "\\d*"), function(x) x[nchar(x) >= 1])

或者：

lapply(str_extract_all(strings, "\\d*"), function(x) x[x != ""])

或者如果您想直接进行操作（通过对@markus中的代码进行一些修改）：

regmatches(strings, gregexpr("[0-9]+",  strings))

从str_extract_all的输出中删除空字符串

2 个答案: