使用R在str_extract中查看

时间:2014-02-06 15:27:05

标签: regex r perl

我有以下文本文件

[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)
[01/29/14 16:42:57, 10.100.120.120, unknown]: spatial_monitor: Alan left Conference Room (Zone Role contains Person role)
[01/29/14 16:43:00, 10.100.120.120, unknown]: spatial_monitor: Kurt entered Conference Room (Computer desk contains Person role)
[01/29/14 16:43:02, 10.100.120.120, unknown]: spatial_monitor: Kurt left Conference Room (Computer desk contains Person role)
[01/29/14 16:43:03, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)
[01/29/14 16:43:08, 10.100.120.120, unknown]: spatial_monitor: Alan left Conference Room (Zone Role contains Person role)
[01/29/14 16:46:07, 10.100.120.120, unknown]: spatial_monitor: Fred entered Conference Room (Zone Role contains Person role)
[01/29/14 16:46:08, 10.100.120.120, unknown]: spatial_monitor: Fred left Conference Room (Zone Role contains Person role)

我正在尝试在R中使用str_extract(在库stringr中)来提取位置的名称(上例中的“会议室”)。逻辑是拉出“输入”或“左”之后的字符串部分。为此,我有以下正则表达式

(?<=entered\s)[A-Z][a-z]+\s[A-Z][a-z]+

这在Notepad ++中工作正常,但是当我将其嵌入R中时,我收到以下错误

> tt <- "[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)"
> str_extract(tt, '(?<=entered\\s)[A-Z][a-z]+\\s[A-Z][a-z]+')
Error in regexpr("(?<=entered\\s)[A-Z][a-z]+\\s[A-Z][a-z]+", "[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)",  : 
  invalid regular expression '(?<=entered\s)[A-Z][a-z]+\s[A-Z][a-z]+', reason 'Invalid regexp'

其他答案告诉我lookahead and lookbehind only work with Perl。那么问题是如何使用str_extract启用Perl?或者有更好的方法吗?提前谢谢。

2 个答案:

答案 0 :(得分:2)

您的正则表达式 有效。如果您指定sub,它适用于perl = TRUE。您还可以使用sub功能完成任务:

sub('.*(?<=entered\\s)([A-Z][a-z]+\\s[A-Z][a-z]+).*', '\\1', tt, perl = TRUE)
# [1] "Conference Room"

或者,没有perl

sub('.*entered\\s([A-Z][a-z]+\\s[A-Z][a-z]+).*', '\\1', tt)
# [1] "Conference Room"

答案 1 :(得分:1)

library(stringr)
tt <- "[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)"
str_extract(tt, perl('(?<=entered\\s)[A-Z][a-z]+\\s[A-Z][a-z]+'))
# [1] "Conference Room"