正则表达式 - 以...开头,包含和结束

时间:2015-12-19 02:49:27

标签: regex r

我有一个包含多个" \ n"的字符串。我想查看每一行并删除包含单词" banana"

的每一行

样本DF:

farm_data <- data.frame(shop=c('fruit'),
                        sentence=c('the basket contains apples
                                  bananas are the best
                                  are we going to eat bananas
                                  why not just boil the fruits
                                  let us make some banana smoothie'), stringsAsFactors=FALSE)

我尝试过的事情:

farm_data$sentence <- gsub(".* bananas .* \n", "\n", farm_data$sentence)

我想要的是什么:

clean_data <- data.frame(shop=c('fruit'),
                        sentence=c('the basket contains apples
                                  why not just boil the fruits'), stringsAsFactors=FALSE)

已删除包含香蕉的行。

感谢。

2 个答案:

答案 0 :(得分:3)

x <- 'the basket contains apples
                                  bananas are the best
                                  are we going to eat bananas
                                  why not just boil the fruits
                                  let us make some banana smoothie'
cat(x)
# the basket contains apples
#                                   bananas are the best
#                                   are we going to eat bananas
#                                   why not just boil the fruits
#                                   let us make some banana smoothie

cat(gsub('.*banana.*\\n?', '', x, perl = TRUE))
# the basket contains apples
#                                   why not just boil the fruits

答案 1 :(得分:1)

我可能以迂回的方式解决这个问题。我首先按换行符\n拆分查询。

sentence <- unlist(strsplit(as.character(farm_data$sentence), '\n'))

之后,我删除了包含单词&#34; banana&#34;的结果分割中的那些元素。

cleanSentence <- sentence[-which(unlist(sapply(sentence, function(x){grep('banana',x)})==1))]

然后我使用paste函数将它重新组合在一起。

clean_data <- data.frame(shop=c('fruit'),
                        sentence= paste(cleanSentence, collapse=' \n'), stringsAsFactors=FALSE)

希望这不是太火了。 :)

解决您对其他&#34;水果&#34;的可用性问题。或字符串:

cleanFruit <- function(fruit = 'banana'){
    sentence <- unlist(strsplit(as.character(farm_data$sentence), '\n'))
    cleanSentence <- sentence[-which(unlist(sapply(sentence, function(x){grep(fruit,x)})==1))]
    clean_data <- data.frame(shop=c('fruit'),
                            sentence= paste(cleanSentence, collapse=' \n'), stringsAsFactors=FALSE)
    return(clean_data)
}

将其写入函数,并将其交给给定的水果(或单词)。 @rawr的回答似乎有点清晰。