从str_extract_all的输出中删除空字符串

时间:2019-02-05 20:16:41

标签: r stringr

我正在尝试整理stringr::str_extract_all的输出,以便删除所有空字符元素。

例如要从以下字符串中提取数字:

strings <- c("100 is 10 greater than 90", "1 in 10 people have 3 - 4 cats", "earth has 1 moon")

str_extract_all(strings, "\\d*") 

这将返回答案,但是有很多空字符元素

# [[1]]
# [1] "100" ""    ""    ""    ""    "10"  ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    "90"  ""   
# 
# [[2]]
# [1] "1"  ""   ""   ""   ""   "10" ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   "3"  ""   ""   ""   "4"  ""   ""   ""   ""   ""   ""  
# 
# [[3]]
# [1] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "1" ""  ""  ""  ""  ""  "" 

如何在保持其结构不变的情况下从此数据中删除""?即

# [[1]]
# [1] "100" "10" "90"   
# 
# [[2]]
# [1] "1"  "10"   "3"   "4"   
# 
# [[3]]
# [1] "1" 

我按照str_extract_all(strings, "\\d*") %>% sapply(., "[!. == ""]")的方式做了一些尝试,但无法获得

2 个答案:

答案 0 :(得分:2)

您没有使用正确的正则表达式。尝试

str_extract_all(strings, "\\d+")
#[[1]]
#[1] "100" "10"  "90" 
#
#[[2]]
#[1] "1"  "10" "3"  "4" 
#
#[[3]]
#[1] "1"

仅使用base R的另一种方法

numbers <- gregexpr("\\d+", strings)
regmatches(strings, numbers)

这当然可以写成一行

regmatches(strings, gregexpr("\\d+", strings))

答案 1 :(得分:1)

您可以尝试:

lapply(str_extract_all(strings, "\\d*"), function(x) x[!x %in% ""])

[[1]]
[1] "100" "10"  "90" 

[[2]]
[1] "1"  "10" "3"  "4" 

[[3]]
[1] "1"

或者:

lapply(str_extract_all(strings, "\\d*"), function(x) x[nchar(x) >= 1])

或者:

lapply(str_extract_all(strings, "\\d*"), function(x) x[x != ""])

或者如果您想直接进行操作(通过对@markus中的代码进行一些修改):

regmatches(strings, gregexpr("[0-9]+",  strings))