Question

我有一串变量名称，我想从中提取由向量给出的货币。但是我在提取值时遇到了困难。

我的第一种方法是用零替换所有货币缩写。

例如：

x <- c("Total Assets in th USD", "Equity in mil EUR", "Number of Branches")
currencies <- c("USD", "EUR", "GBP")

regex <- paste0("([^",
                paste(currencies, collapse = "|"),
                "])")
# results in
# "([^USD|EUR|GBP])"

gsub(regex, "", x)
# [1] "USD"  "EEUR" "B"

预期结果为c("USD", "EUR", "")

这显然是错误的，因为它匹配单个字符（E，U，R）而不是字符组（EUR）。现在我的问题是，我怎样才能只提取给定的组？

Answer 1

您可以使用

x <- c("Total Assets in th USD", "Equity in mil EUR", "Number of Branches")
currencies <- c("USD", "EUR", "GBP")

regex <- paste0("\\b(",
                    paste(currencies, collapse = "|"),
                ")\\b")
# results in
# "\b(USD|EUR|GBP)\b"

regmatches(x, gregexpr(regex, x))

请参阅R demo online

输出：

[[1]]
[1] "USD"

[[2]]
[1] "EUR"

[[3]]
character(0)

如果货币显示为“胶合”到数字，则需要删除单词边界（\b）。

Answer 2

我们可以使用str_extract

library(stringr)
str_extract(x, paste(currencies, collapse="|"))
#[1] "USD" "EUR" NA

或使用sub

中的base R

v1 <- sub(paste0(".*\\b(", paste(currencies, collapse="|"), ")\\b.*"), "\\1", x)
replace(v1, !v1 %in% currencies, "")
#[1] "USD" "EUR" ""

R regex：检索货币缩写

2 个答案: