您好我正试图在一个句子中找到一个简短的文字,然后做一些操作。它很容易在java但在R我有一些问题。我没有达到条件。 这是我的代码
rm(list=ls())
library(tidytext)
library(dplyr)
shortText= c('grt','gr8','bcz','ur')
tweet=c('stats is gr8','this car is good','your movie is grt','i hate your book of hatred','food is awsome'
)
tweet=data.frame(tweet, stringsAsFactors = FALSE)
for(row in 1:nrow(tweet)) {
tweetWords=strsplit(tweet[row,]," ")
print(tweetWords)
for (word in 1:length(tweetWords)) {
if(tweetWords[word] %in% shortText){
print('we have a match')
}
}
答案 0 :(得分:1)
以下是使用grepl
的简单基础R选项:
shortText <- c('grt','gr8','bcz','ur')
tweet <- c('stats is gr8','this car is good','your movie is grt','i hate your book of hatred','food is awsome')
res <- sapply(shortText, function(x) grepl(paste0("\\b", x, "\\b"), tweet))
tweet[rowSums(res)]
[1] "stats is gr8" "stats is gr8"
基本思想是生成一个矩阵,其行是推文,其列是关键字。如果我们在给定行中找到一个或多个1(真)值,则意味着在一个或多个关键字上发布了推文。
请注意,我按字边界\b
包围每个搜索字词。搜索项不会错误地匹配为较大单词的子字符串。
答案 1 :(得分:0)
有很多方法可以改善这一点。但是对代码进行最小更改的快速解决方案:
shortText= c('grt','gr8','bcz','ur')
tweet=c('stats is gr8','this car is good','your movie is grt','i hate your book of hatred','food is awsome'
)
tweet=data.frame(tweet, stringsAsFactors = FALSE)
for(row in 1:nrow(tweet)) {
tweetWords=strsplit(tweet[row,]," ")
print(tweetWords)
for (word in 1:length(tweetWords)) {
if(any(tweetWords[word][[1]] %in% shortText)){
print('we have a match')
}
}
}
返回:
[[1]]
[1] "stats" "is" "gr8"
[1] "we have a match"
[[1]]
[1] "this" "car" "is" "good"
[[1]]
[1] "your" "movie" "is" "grt"
[1] "we have a match"
[[1]]
[1] "i" "hate" "your" "book" "of" "hatred"
[[1]]
[1] "food" "is" "awsome"
如果任何布尔运算符为T,则添加any
将执行if
语句,如果没有它,它将只使用第一个元素
答案 2 :(得分:0)
可能是这样的:
cbind(tweet, ifelse(sapply(shortText, grepl, x = tweet), "Match is found", "No match"))
tweet grt gr8 bcz
[1,] "stats is gr8" "No match" "Match is found" "No match"
[2,] "this car is good" "No match" "No match" "No match"
[3,] "your movie is grt" "Match is found" "No match" "No match"
[4,] "i hate your book of hatred" "No match" "No match" "No match"
[5,] "food is awsome" "No match" "No match" "No match"
ur
[1,] "No match"
[2,] "No match"
[3,] "Match is found"
[4,] "Match is found"
[5,] "No match"