Question

我认为我的RegEx技能已经足够好但现在我坐在这里不知道如何解决我的问题。

首先我有一个类似的文字：

text <- "This DEV-1231 story is about a man. He DEV-1232 is from DEV-1233 the USA. He is a university professor. He goes DEV-1234 to Nepal. He DEV-1235 climbs a mountain. The mountain is covered in ice. There is a hole in the ice. It is 22 metres deep. The man falls in it. DEV-1236 He doesn’t DEV-1237 go all the way down. He stops somewhere in the hole. He cannot move. His arm and five ribs are broken."

使用一些特殊的独特开发者ID：

dev_id <- "DEV-123[0-9]"

之后使用str_extract_all和unlist提取它们没有问题。

但我想提取以下30个字符或5个单词，并结合ID。有时你看，两个ID之间的字符/单词更少，这是我的问题。在这种情况下，只应返回2/3/4个单词。

return
[1] DEV-1231 story is about a man.
[2] DEV-1232 is from
[3] DEV-1233 the USA. He is a
[4] DEV-1234 to Nepal. He
[5] DEV-1235 climbs a mountain. The mountain
[6] DEV-1236 He doesn't
[7] DEV-1237 go all the way down

在这个例子中，我虽然最多可以将5个单词组合到ID中。这5个字可以标点符号。

提前致谢！

Answer 1

DEV-123[0-9]尝试匹配＆＃34;空格+非空格＆＃34;一系列最多五次出现（(?:\s+\S+){0,5}）但需要＆＃34;非空格＆＃34;使用否定前瞻不匹配DEV-123[0-9]模式：

DEV-123[0-9](?:\s+(?!DEV-123[0-9])\S+){0,5}

演示：https://regex101.com/r/AxtUkI/1

R匹配不同数量的单词

1 个答案: