更新的问题解决方案

Question

假设我有文字，并且想提取完全匹配的内容。如何有效地做到这一点：

test_text <- c("[]", "[1234]", "[1234a]", "[v1256a] ghjk kjh", 
               "[othername1256b] kjhgfd hgj",
               "[v1256] ghjk kjh", "[v1256] kjhgfd hgj",
               " text here [name1991] and here",
               "[name1990] this is an explanation",
               "[name1991] this is another explanation",
               "[mäölk1234]")
expected <- c("[v1256a]", "[othername1256b]", "[v1256]", "[v1256]", "[name1991]",
              "[name1990]", "[name1991]", "[mäölk1234]")

# This works:
regmatches(text, regexpr("\\[.*[0-9]{4}.*\\]", text))

但是我想像"\\[.*[0-9]{4}(?[a-z])]\\]"这样的东西会更好，但是会引发错误

regexpr中出现错误（“ \ [。 [0-9] {4}（？[a-z]）] \]”，文本）：无效正则表达式'[。 [0-9] {4}（？[a-z]）]]'，原因为“无效的正则表达式”

年份后只能有一个字母，但不能有，请参见示例。抱歉，我很少使用regexpr ...

Answer 1

更新的问题解决方案

似乎您要提取所有出现的1+个字母，后跟4位数字，然后提取方括号内的可选字母。

使用

test_text <- c("[]", "[1234]", "[1234a]", "[v1256a] ghjk kjh", 
           "[othername1256b] kjhgfd hgj",
           "[v1256] ghjk kjh", "[v1256] kjhgfd hgj",
           " text here [name1991] and here",
           "[name1990] this is an explanation",
           "[name1991] this is another explanation",
           "[mäölk1234]")

regmatches(test_text, regexpr("\\[\\p{L}+[0-9]{4}\\p{L}?]", test_text, perl=TRUE))
# => c("[v1256a]", "[othername1256b]", "[v1256]", "[v1256]", "[name1991]",
#      "[name1990]", "[name1991]", "[mäölk1234]")

在线查看R demo。注意，您需要使用PCRE正则表达式才能正常工作，perl=TRUE在这里至关重要。

详细信息

\[-一个[字符
\p{L}+-1个以上的Unicode字母
[0-9]{4}-四个ASCII数字
\\p{L}?-可选的任何Unicode字母
]-一个]字符。

原始答案

使用

regmatches(test_text, regexpr("\\[[^][]*[0-9]{4}[[:alpha:]]?]", test_text))

或

regmatches(test_text, regexpr("\\[[^][]*[0-9]{4}[a-zA-Z]?]", test_text))

请参见regex demo和Regulex图：

详细信息

\[-一个[字符
[^][]*-除[和]以外的0个或多个字符（提示：如果您只希望此处的字母替换为[[:alpha:]]*或[a-zA-Z]*）
[0-9]{4}-四位数
[[:alpha:]]?-可选字母（或[a-zA-Z]?将与任何ASCII可选字母匹配）
]-一个]字符

R测试：

regmatches(test_text, regexpr("\\[[^][]*[0-9]{4}[[:alpha:]]?]", test_text))
## => [1] "[v1256a]"         "[othername1256b]" "[v1256]"          "[v1256]"          "[name1991]"       "[name1990]"       "[name1991]"

从数组中提取完全匹配

1 个答案:

更新的问题解决方案

原始答案