grep \ s返回空白

时间:2018-04-12 20:23:41

标签: r regex regex-negation regex-lookarounds

语言:R,IDE:R Studio

我正在编写一个脚本来从pdf文件中提取和排除特定信息(a.k.a是一个庞大的字符串)。我用grep将字符串拆分成我想要的页面。我希望进一步减少这一点。我减肥的脚本是......

variablename <- grep("Additional Information:(?! )", AnyAdditionalInfoPages,   
     perl = TRUE, value = TRUE)

这正是我想要的方式。我是R和正则表达式的新手,所以我想练习,我尝试了以下......

variablename <- grep("Additional Information:(?!\s)", AnyAdditionalInfoPages, 
    perl = TRUE, value = TRUE)

结果是 - 错误:'\ s'是字符串中无法识别的转义符“”附加信息:(?!\ s“

variablename <- grep("Additional Information:(?!\\s)", AnyAdditionalInfoPages, 
    perl = TRUE, value = TRUE)

结果是一个空变量

> variablename
character(0)

发生了什么事?为什么“”工作但字符串的转义字符不起作用?

1 个答案:

答案 0 :(得分:0)

啊,这很有趣。

模式"Additional Information:(?! )"不会在&#34;:&#34;之后选择包含单个空格的字符串,但使用(?!\\s)将不会选择包含任何的字符串空格字符,例如制表符。一种可能的解释是你有非空间&#34;您正在解析的向量中的空格形式。

AnyAdditionalInfoPages <- c("Additional Information: page 20", # one space
                            "Additional Information:  page 7", # two spaces
                            "Additional Information:\tpage 50", # tab
                            "Additional Information:\npage 60") # newline

# Print vector to observe true formatting
cat(AnyAdditionalInfoPages)

# Output:
Additional Information: page 20
Additional Information:  page 7
Additional Information:       page 50
Additional Information:
page 60


# Negative lookahead for spaces *only*
variablename <- grep("Additional Information:(?! )", AnyAdditionalInfoPages,   
                     perl = TRUE, value = TRUE)
# Output
[1] "Additional Information:\tpage 50"  "Additional Information:\npage 60"

# Negative lookahead for *any* whitespace
variablename <- grep("Additional Information:(?!\\s)", AnyAdditionalInfoPages,   
                     perl = TRUE, value = TRUE)
# Output
character(0)