Question

我正在尝试提取：-1960.85

来自：

">Return on Equity</span><!-- react-text: 141 --> <!-- /react-text --><!-- react-text: 142 -->(ttm)<!-- /react-text --><sup aria-label=\"KS_HELP_SUP_undefined\" data-reactid=\"143\"></sup></td><td class=\"Fz(s) Fw(500) Ta(end)\" data-reactid=\"144\">-1,960.85%</td></tr></tbody></table></div><div data-reactid=\"145\"><h3 class=\""

我正在使用以下方法提取它：

stringr::str_extract(loc, "[:punct:]\\d+\\.\\d+\\D")

不幸的是，这认为我指的是1,986.85中的逗号，并完全切掉了1。顺便说一下，我不要逗号。如何使用str_extract()（或任何其他方法）获得所需的输出？

loc <- ">Return on Equity</span><!-- react-text: 141 --> <!-- /react-text --><!-- react-text: 142 -->(ttm)<!-- /react-text --><sup aria-label=\"KS_HELP_SUP_undefined\" data-reactid=\"143\"></sup></td><td class=\"Fz(s) Fw(500) Ta(end)\" data-reactid=\"144\">-1,960.85%</td></tr></tbody></table></div><div data-reactid=\"145\"><h3 class=\""

Answer 1

在上面的示例中，您可以通过将,与digits一起包含为[0-9，]来进行修复。

stringr::str_extract(loc, "[:punct:][0-9,]+\\.\\d+\\D")
#[1] "-1,960.85%"

其他选项可以认为是：

library(stringr)

str_replace(str_extract(loc, "[:punct:][0-9,]+\\.\\d+\\D"),",","")
#[1] "-1960.85%"

但是，如果您的内容类型为html/xml，则正如@TimBiegeleisen所建议的那样，您应该在分析文本之前使用适当的解析器来解析文本。

stringr :: str_replace（）在R中提取带符号的数字

1 个答案: