如何使用R检查数据集中字段文本中是否包含特定单词。
在SQL中,我们可以使用LIKE比较运算符。例如,
SELECT * FROM schools WHERE name LIKE '%Public School%'
如果我必须在R中做同样的事情,我该怎么做?
答案 0 :(得分:7)
鉴于
schools <- data.frame(rank = 1:20,
name = rep(c("X Public School", "Y Private School"), 10))
试试这个:
subset(schools, grepl("Public School", name))
或者这个:
schools[ grep("Public School", schools$name), ]
或者这个:
library(sqldf)
sqldf("SELECT * FROM schools WHERE name LIKE '%Public School%'")
或者这个:
library(data.table)
data.table(schools)[ grep("Public School", name) ]
或者这个:
library(dplyr)
schools %>% filter(grepl("Public School", name))
答案 1 :(得分:0)
在Base R中,可以使用%in%来分组数据,例如dataframe [dataframe $ variable%in%dataframe2 $ variable2]
答案 2 :(得分:0)
qdap
包有agrep
的便捷包装,可让您搜索数据框或特定字段中的所有字段:
schools <- data.frame(
rank = 1:20,
schools = rep(c("X Public School", "Y Private School"), 10)
)
library(qdap)
Search(schools, "Public School", "schools")
## rank schools
## 1 1 X Public School
## 3 3 X Public School
## 5 5 X Public School
## 7 7 X Public School
## 9 9 X Public School
## 11 11 X Public School
## 13 13 X Public School
## 15 15 X Public School
## 17 17 X Public School
## 19 19 X Public School
答案 3 :(得分:0)
我认为以下内容可能会以简单的方式回答这个问题。
它合并了%in%和%like%function
'%inlike%'<-function(namevec1,namevec2){
temp1<-strsplit(namevec1," ")
temp2<-strsplit(namevec2," ")
ifelse(is.na(charmatch(temp1,temp2)),F,T)
}
namevec1<-c("ffd","ff","hello_world")
namevec2<-c("ffde","ff ","hello_wor")
namevec1%inlike%namevec2
[1] TRUE TRUE FALSE
namevec2%inlike%namevec1
[1] FALSE TRUE
(请注意空格差异)