Question

我正在尝试使用extract中的tidyr函数进行一些模式匹配。我已经在正则表达式练习站点中测试了我的正则表达式，该模式似乎可行，并且我正在使用lookbehind assertion。

我有以下示例文本：

=[\"{ Key = source, Values = web,videoTag,assist }\",\"{ Key = type, 
Values = attack }\",\"{ Key = team, Values = 2 }\",\"{ Key = 
originalStartTimeMs, Values = 56496 }\",\"{ Key = linkId, Values = 
1551292895649 }\",\"{ Key = playerJersey, Values = 8 }\",\"{ Key = 
attackLocationStartX, Values = 3.9375 }\",\"{ Key = 
attackLocationStartY, Values = 0.739376770538243 }\",\"{ Key = 
attackLocationStartDeflected, Values = false }\",\"{ Key = 
attackLocationEndX, Values = 1.7897727272727275 }\",\"{ Key = 
attackLocationEndY, Values = -1.3002832861189795 }\",\"{ Key = 
attackLocationEndDeflected, Values = false }\",\"{ Key = lastModified, 
Values = web,videoTag,assist

我想获取attackLocationX之后的数字（所有有关攻击位置的文字后面的数字。

但是，将以下代码与lookbehind断言一起使用时，我没有任何结果：

df %>% 
extract(message, "x_start",'((?<=attackLocationStartX,/sValues/s=/s)[0- 
9.]+)')

如果未找到任何模式匹配，则此函数将返回NA，尽管我已经测试了NA上的模式，但我的目标列是所有www.regexr.com值。根据文档，R模式匹配支持后置断言，因此我不确定在此还要做什么。

Answer 1

我不确定后面的部分，但是在R中，您需要转义反斜杠。如果您使用的不是R特定的正则表达式检查器，那么这不是很明显。

更多信息here。

所以您可能希望您的正则表达式看起来像：

"attackLocationStartX,\\sValues\\s=\\s)[0-9.]+"

Answer 2

首先，要匹配空白，您需要\s，而不是/s。

您不必在此处使用后退标记，因为如果模式中使用捕获组，则extract将返回捕获的子字符串。

使用

df %>% 
  extract(message, "x_start", "attackLocationStartX\\s*,\\s*Values\\s*=\\s*(-?\\d+\\.\\d+)")

输出：3.9375。

正则表达式也可能类似于"attackLocationStartX\\s*,\\s*Values\\s*=\\s*(-?\\d[.0-9]*)"。

在捕获(-?\\d+\\.\\d+)部分时，只有该组中的文本才是输出。

模式详细信息

(-?\d+\.\d+)-匹配的捕获组
- -?-可选的连字符（?表示 1或0次出现）
- \d+-1或数字（+表示 1或更多）
- \.-一个点
- \d+-1或数字
\d[.0-9]*-一个数字（\d），后跟0个或多个点或数字（[.0-9]*）

R背后的正则表达式断言

2 个答案: