语言:R,IDE:R Studio
我正在编写一个脚本来从pdf文件中提取和排除特定信息(a.k.a是一个庞大的字符串)。我用grep将字符串拆分成我想要的页面。我希望进一步减少这一点。我减肥的脚本是......
variablename <- grep("Additional Information:(?! )", AnyAdditionalInfoPages,
perl = TRUE, value = TRUE)
这正是我想要的方式。我是R和正则表达式的新手,所以我想练习,我尝试了以下......
variablename <- grep("Additional Information:(?!\s)", AnyAdditionalInfoPages,
perl = TRUE, value = TRUE)
结果是 - 错误:'\ s'是字符串中无法识别的转义符“”附加信息:(?!\ s“
和
variablename <- grep("Additional Information:(?!\\s)", AnyAdditionalInfoPages,
perl = TRUE, value = TRUE)
结果是一个空变量
> variablename
character(0)
发生了什么事?为什么“”工作但字符串的转义字符不起作用?
答案 0 :(得分:0)
啊,这很有趣。
模式"Additional Information:(?! )"
不会在&#34;:&#34;之后选择包含单个空格的字符串,但使用(?!\\s)
将不会选择包含任何的字符串空格字符,例如制表符。一种可能的解释是你有非空间&#34;您正在解析的向量中的空格形式。
AnyAdditionalInfoPages <- c("Additional Information: page 20", # one space
"Additional Information: page 7", # two spaces
"Additional Information:\tpage 50", # tab
"Additional Information:\npage 60") # newline
# Print vector to observe true formatting
cat(AnyAdditionalInfoPages)
# Output:
Additional Information: page 20
Additional Information: page 7
Additional Information: page 50
Additional Information:
page 60
# Negative lookahead for spaces *only*
variablename <- grep("Additional Information:(?! )", AnyAdditionalInfoPages,
perl = TRUE, value = TRUE)
# Output
[1] "Additional Information:\tpage 50" "Additional Information:\npage 60"
# Negative lookahead for *any* whitespace
variablename <- grep("Additional Information:(?!\\s)", AnyAdditionalInfoPages,
perl = TRUE, value = TRUE)
# Output
character(0)