R - exclude关键字中的正则表达式

时间:2013-10-14 19:58:50

标签: regex r

我的数据集中有两个名称相似的变量:“JE.Description”和“Field.Description”。如何定位“JE.Description”列的列索引,以便从RegExp搜索中排除“字段”一词?换句话说,我想修改下面的命令只返回“JE.Description”的列索引:

数据集经常更新,有时“JE.Description”字符串显示为“描述”。这就是我寻求明确排除关键字“Field”的解决方案的原因。

r1 <- c(1:5)
r2 <- c(1:5)
df <- data.frame(r1,r2)
names(df)[1] <- "JE.Description"
names(df)[2] <- "Field.Description"

y <- grep("!^Field^Description",perl = TRUE, colnames(df))
RETURNS: integer[0]

谢谢,

2 个答案:

答案 0 :(得分:4)

要匹配包含"Description" 的每个字符串(除外)以及其前面有"Field."的字符串,请使用否定的lookbehind断言:

## The regex pattern
pat <- "(?<!Field\\.)Description"

## Try it out
x <- c("Description", "Field.Description", "FieldDescription", "xyz Description")
grep(pat, x, perl=TRUE)  # Note: lookahead & lookbehind assertions need perl=TRUE
# [1] 1 3 4

或者,如果子串"field"可能出现在相对于"Description"的某个其他位置,(或者可能是大写或小写版本),那么仅grepl()可能更简单两次并使用布尔运算符来组合结果:

x <- c("Description", "fieldDescription", "Field-of-Description", 
       "Description field")
which(grepl("Description", x) & !grepl("field", x, ignore.case=TRUE))
[1] 1

答案 1 :(得分:0)

mydata<-structure(list(Description = c(21, 21, 22.8, 21.4, 18.7, 18.1, 
14.3, 24.4, 22.8, 19.2), Field.Description = c(6, 6, 4, 6, 8, 
6, 8, 4, 4, 6)), .Names = c("Description", "Field.Description"
), row.names = c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710", 
"Hornet 4 Drive", "Hornet Sportabout", "Valiant", "Duster 360", 
"Merc 240D", "Merc 230", "Merc 280"), class = "data.frame")

mydata[grep("^Description",names(mydata))]
                  Description
Mazda RX4                21.0
Mazda RX4 Wag            21.0
Datsun 710               22.8
Hornet 4 Drive           21.4
Hornet Sportabout        18.7
Valiant                  18.1
Duster 360               14.3
Merc 240D                24.4
Merc 230                 22.8
Merc 280                 19.2