Question

我有一个包含多个＆＃34; \ n＆＃34;的字符串。我想查看每一行并删除包含单词＆＃34; banana＆＃34;

的每一行

样本DF：

farm_data <- data.frame(shop=c('fruit'),
                        sentence=c('the basket contains apples
                                  bananas are the best
                                  are we going to eat bananas
                                  why not just boil the fruits
                                  let us make some banana smoothie'), stringsAsFactors=FALSE)

我尝试过的事情：

farm_data$sentence <- gsub(".* bananas .* \n", "\n", farm_data$sentence)

我想要的是什么：

clean_data <- data.frame(shop=c('fruit'),
                        sentence=c('the basket contains apples
                                  why not just boil the fruits'), stringsAsFactors=FALSE)

已删除包含香蕉的行。

感谢。

Answer 1

x <- 'the basket contains apples
                                  bananas are the best
                                  are we going to eat bananas
                                  why not just boil the fruits
                                  let us make some banana smoothie'
cat(x)
# the basket contains apples
#                                   bananas are the best
#                                   are we going to eat bananas
#                                   why not just boil the fruits
#                                   let us make some banana smoothie

cat(gsub('.*banana.*\\n?', '', x, perl = TRUE))
# the basket contains apples
#                                   why not just boil the fruits

Answer 2

我可能以迂回的方式解决这个问题。我首先按换行符\n拆分查询。

sentence <- unlist(strsplit(as.character(farm_data$sentence), '\n'))

之后，我删除了包含单词＆＃34; banana＆＃34;的结果分割中的那些元素。

cleanSentence <- sentence[-which(unlist(sapply(sentence, function(x){grep('banana',x)})==1))]

然后我使用paste函数将它重新组合在一起。

clean_data <- data.frame(shop=c('fruit'),
                        sentence= paste(cleanSentence, collapse=' \n'), stringsAsFactors=FALSE)

希望这不是太火了。：）

解决您对其他＆＃34;水果＆＃34;的可用性问题。或字符串：

cleanFruit <- function(fruit = 'banana'){
    sentence <- unlist(strsplit(as.character(farm_data$sentence), '\n'))
    cleanSentence <- sentence[-which(unlist(sapply(sentence, function(x){grep(fruit,x)})==1))]
    clean_data <- data.frame(shop=c('fruit'),
                            sentence= paste(cleanSentence, collapse=' \n'), stringsAsFactors=FALSE)
    return(clean_data)
}

将其写入函数，并将其交给给定的水果（或单词）。 @rawr的回答似乎有点清晰。

正则表达式 - 以...开头，包含和结束

2 个答案: