我正在尝试整理stringr::str_extract_all
的输出,以便删除所有空字符元素。
例如要从以下字符串中提取数字:
strings <- c("100 is 10 greater than 90", "1 in 10 people have 3 - 4 cats", "earth has 1 moon")
str_extract_all(strings, "\\d*")
这将返回答案,但是有很多空字符元素
# [[1]]
# [1] "100" "" "" "" "" "10" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "90" ""
#
# [[2]]
# [1] "1" "" "" "" "" "10" "" "" "" "" "" "" "" "" "" "" "" "" "" "3" "" "" "" "4" "" "" "" "" "" ""
#
# [[3]]
# [1] "" "" "" "" "" "" "" "" "" "" "1" "" "" "" "" "" ""
如何在保持其结构不变的情况下从此数据中删除""
?即
# [[1]]
# [1] "100" "10" "90"
#
# [[2]]
# [1] "1" "10" "3" "4"
#
# [[3]]
# [1] "1"
我按照str_extract_all(strings, "\\d*") %>% sapply(., "[!. == ""]")
的方式做了一些尝试,但无法获得
答案 0 :(得分:2)
您没有使用正确的正则表达式。尝试
str_extract_all(strings, "\\d+")
#[[1]]
#[1] "100" "10" "90"
#
#[[2]]
#[1] "1" "10" "3" "4"
#
#[[3]]
#[1] "1"
仅使用base R
的另一种方法
numbers <- gregexpr("\\d+", strings)
regmatches(strings, numbers)
这当然可以写成一行
regmatches(strings, gregexpr("\\d+", strings))
答案 1 :(得分:1)
您可以尝试:
lapply(str_extract_all(strings, "\\d*"), function(x) x[!x %in% ""])
[[1]]
[1] "100" "10" "90"
[[2]]
[1] "1" "10" "3" "4"
[[3]]
[1] "1"
或者:
lapply(str_extract_all(strings, "\\d*"), function(x) x[nchar(x) >= 1])
或者:
lapply(str_extract_all(strings, "\\d*"), function(x) x[x != ""])
或者如果您想直接进行操作(通过对@markus中的代码进行一些修改):
regmatches(strings, gregexpr("[0-9]+", strings))