将两个GREPL搜索与“ OR”组合并添加“ NOT”

时间:2018-07-27 15:08:17

标签: r

我有这个数据框:

ID      Description    
1     Tree fell on car 
2     Tree was uprooted
3     While cutting tree, it came down
4     Tree came down

我正在尝试在数据框中的一列中搜索天气字。我通过使用多个由'OR'分隔的GREPL函数来做到这一点。但是,我想结合两个grepl函数说:“如果描述中包含此单词和此单词,但不包含此单词,则为天气”。如果您看一下上面的数据框,则可以假定“树木倒下”被归类为天气,但是“砍伐树木时倒下”与天气无关。

我从其他堆栈溢出答案中尝试的代码是:

Data$Type<-ifelse(grepl(' Tree|^Tree|- 
Tree|:Tree',Data$DESCRIPTION,ignore.case=TRUE)& 
grepl('^[^Cutting]*[Feel|Fell|Fall|Up Rooted|Uprooted|Came Down| Down|Knocked 
Onto|Caused Damage] 
[^Cutting]*$',Data$DESCRIPTION,ignore.case=TRUE)), "weather", "Not 
Classified")

但这不起作用。我尝试过:

Data$Type<-ifelse(grepl(' Tree|^Tree|- 
Tree|:Tree',Data$DESCRIPTION,ignore.case=TRUE)& grepl('Feel|Fell|Fall|Up 
Rooted|Uprooted|Came Down| Down|Knocked Onto|Caused 
Damage',Data$DESCRIPTION,ignore.case=TRUE) & 
!grepl('Cutting',Data$DESCRIPTION,ignore.case=TRUE)), "Weather", "Not 
Classified")

我期待这个结果:

ID      Description                      Type
1     Tree fell on car                   "Weather"
2     Tree was uprooted                  "Weather"
3     While cutting tree, it came down   "Non-Weather"
4     Tree came down                     "Weather"

但是这些不起作用。谢谢

2 个答案:

答案 0 :(得分:0)

由于只有两种情况(天气和非天气),我认为只使用grepl就足够了:

df$Type <- sapply(df$Description, 
                  function(x) ifelse(grepl(pattern = 'Tree|fell|^cutting',x = x),'Weather','Non-Weather'))

[1] "Weather"     "Weather"     "Non-Weather" "Weather"   

答案 1 :(得分:0)

我最终只是做这样的事情,以确保“ Ice”是一个天气词,但要确定“ Maker”。

ifelse(grepl('Ice$| Ice |,Ice |^Ice | Ice,',Data$DESCRIPTION,ignore.case=TRUE) & 
!grepl('Maker',Data$DESCRIPTION,ignore.case=TRUE))