我正在尝试编写一个表达式,该表达式从具有相应货币符号和潜在金额缩写(m或k)的字符串中提取数字:
text <- "$10000 and $10,000 and $5m and $50m and $50.2m and $50,2m"
str_extract(text, "\\$(\\d+)[a-z]+") # solution_1
str_extract(text, "\\$(\\d+)+") #solution_2
所需的输出:
"$10000 $10,000 $5m $50m $50.2m $50,2m"
问题在于solution_1
仅提取“ $ 5m”,而solution_2
仅提取“ $ 10000”。
更新:@Tim Biegeleisen提供了一个很好的解决方案。我还试图摆脱最后的一段时期,例如$50m. and...
得到$50m
。
text <- "$5, $10,000, and $5m, and $50m. and $50.2m and $50,2m"
m <- gregexpr("\\$[0-9.,]+?[mbt]?(?=(?:, | |$))", text, perl=TRUE)
regmatches(text, m)
答案 0 :(得分:3)
尝试将grepexpr
与regmatches
一起使用:
text <- "$10000 and $10,000 and $5m and $50m and $50.2m and $50,2m"
m <- gregexpr("\\$[0-9.,]+[mbt]?", text)
regmatches(text, m)
[[1]]
[1] "$10000" "$10,000" "$5m" "$50m" "$50.2m" "$50,2m"
我假设只有数字,逗号和小数点组成一个给定的数量字符串。我还假设该金额可能以m
,b
或t
结尾(百万,十亿,万亿)。
答案 1 :(得分:0)
也可以这样做,例如这样
txt = unlist(strsplit(text, split = " "))
txt[grep("\\$\\d+((,|\\.)?)(\\d*)?(m)?", txt)]
[1] "$10000" "$10,000" "$5m" "$50m" "$50.2m" "$50,2m"
答案 2 :(得分:0)
也许我们可以使用gsub
作为OP的预期输出显示为单个字符串
gsub("\\b[A-Za-z]+,?|[,.](\\s)", "\\1", text)
#[1] "$10000 $10,000 $5m $50m $50.2m $50,2m"
#[2] "$5 $10,000 $5m $50m $50.2m $50,2m"
text <- c( "$10000 and $10,000 and $5m and $50m and $50.2m and $50,2m",
"$5, $10,000, and $5m, and $50m. and $50.2m and $50,2m")