使用%in%查找没有完全匹配的短语

时间:2018-05-30 08:05:26

标签: r

我正在使用下面的代码来查找名为Area的数据框中列内的某些状态。 “区域”列通常包含城市和州,以及此处显示的其他短语。因此,例如,Ohio中找不到Cleveland, Ohio

知道如何更改代码以查找状态的部分匹配吗?我可以在某处的代码中添加%like%吗?

DataFrame名为Data:

**Area**   
Cleveland, Ohio
Manhattan, New York
Lower Nevada

代码:

StateFunding <- c("California","North 
Carolina","Texas","Florida","Maryland","Pennsylvania","New York")
Data$Classification = "0"

for (i in 1:length(Data$Area))
{
  if(Data$Area[i] %in% StateFunding) {
   Data$Classification[i] = "InList"
  } else {
   Data$Classification[i] = "NotinList"
  }
}

2 个答案:

答案 0 :(得分:1)

可能的解决方案:

i <- rowSums(sapply(StateFunding, function(p) agrepl(p, mydf$Area))) > 0

mydf$Classification <- c('NotInList','InList')[1 + i]

给出:

> mydf
                 Area Classification
1     Cleveland, Ohio      NotInList
2 Manhattan, New York         InList
3        Lower Nevada      NotInList

您也可以使用ifelse

执行此操作
ifelse(rowSums(sapply(StateFunding, function(p) agrepl(p, mydf$Area))) > 0, 'InList', 'NotInList')

使用过的数据:

mydf <- structure(list(Area = c("Cleveland, Ohio", "Manhattan, New York", "Lower Nevada")), 
                  .Names = "Area", class = "data.frame", row.names = c(NA, -3L))

答案 1 :(得分:0)

您可以使用函数grepl来检查一个字符串是否是另一个字符串的子字符串。例如:

for (i in 1:length(Data$Area))
{
  if(sum(sapply(StateFunding,grepl,x=Data$Area[i]))>0){
    Data$Classification[i] = "InList"
  } else {
    Data$Classification[i] = "NotinList"
  }
}