Question

您好我正试图在一个句子中找到一个简短的文字，然后做一些操作。它很容易在java但在R我有一些问题。我没有达到条件。这是我的代码

rm(list=ls())
library(tidytext)
library(dplyr)

shortText= c('grt','gr8','bcz','ur')


tweet=c('stats is gr8','this car is good','your movie is grt','i hate your book of hatred','food is awsome'
        )
tweet=data.frame(tweet, stringsAsFactors = FALSE)

for(row in 1:nrow(tweet)) {

tweetWords=strsplit(tweet[row,]," ")
print(tweetWords)
  for (word in 1:length(tweetWords)) {
    if(tweetWords[word] %in% shortText){
      print('we have a match')
    }

  }

Answer 1

以下是使用grepl的简单基础R选项：

shortText <- c('grt','gr8','bcz','ur')
tweet <- c('stats is gr8','this car is good','your movie is grt','i hate your book of hatred','food is awsome')

res <- sapply(shortText, function(x) grepl(paste0("\\b", x, "\\b"), tweet))
tweet[rowSums(res)]

[1] "stats is gr8" "stats is gr8"

Demo

基本思想是生成一个矩阵，其行是推文，其列是关键字。如果我们在给定行中找到一个或多个1（真）值，则意味着在一个或多个关键字上发布了推文。

请注意，我按字边界\b包围每个搜索字词。搜索项不会错误地匹配为较大单词的子字符串。

Answer 2

有很多方法可以改善这一点。但是对代码进行最小更改的快速解决方案：

shortText= c('grt','gr8','bcz','ur')


tweet=c('stats is gr8','this car is good','your movie is grt','i hate your book of hatred','food is awsome'
)
tweet=data.frame(tweet, stringsAsFactors = FALSE)

for(row in 1:nrow(tweet)) {

  tweetWords=strsplit(tweet[row,]," ")
  print(tweetWords)
  for (word in 1:length(tweetWords)) {
    if(any(tweetWords[word][[1]] %in% shortText)){
      print('we have a match')
    }

  }
}

返回：

[[1]]
[1] "stats" "is"    "gr8"  

[1] "we have a match"
[[1]]
[1] "this" "car"  "is"   "good"

[[1]]
[1] "your"  "movie" "is"    "grt"  

[1] "we have a match"
[[1]]
[1] "i"      "hate"   "your"   "book"   "of"     "hatred"

[[1]]
[1] "food"   "is"     "awsome"

如果任何布尔运算符为T，则添加any将执行if语句，如果没有它，它将只使用第一个元素

Answer 3

可能是这样的：

cbind(tweet, ifelse(sapply(shortText, grepl, x = tweet), "Match is found", "No match"))

             tweet                        grt              gr8              bcz       
    [1,] "stats is gr8"               "No match"       "Match is found" "No match"
    [2,] "this car is good"           "No match"       "No match"       "No match"
    [3,] "your movie is grt"          "Match is found" "No match"       "No match"
    [4,] "i hate your book of hatred" "No match"       "No match"       "No match"
    [5,] "food is awsome"             "No match"       "No match"       "No match"
     ur              
    [1,] "No match"      
    [2,] "No match"      
    [3,] "Match is found"
    [4,] "Match is found"
    [5,] "No match"

在R中的句子中找到一个字符串

3 个答案:

Demo