我正在使用下面的代码来查找名为Area的数据框中列内的某些状态。 “区域”列通常包含城市和州,以及此处显示的其他短语。因此,例如,Ohio
中找不到Cleveland, Ohio
。
知道如何更改代码以查找状态的部分匹配吗?我可以在某处的代码中添加%like%
吗?
DataFrame名为Data:
**Area**
Cleveland, Ohio
Manhattan, New York
Lower Nevada
代码:
StateFunding <- c("California","North
Carolina","Texas","Florida","Maryland","Pennsylvania","New York")
Data$Classification = "0"
for (i in 1:length(Data$Area))
{
if(Data$Area[i] %in% StateFunding) {
Data$Classification[i] = "InList"
} else {
Data$Classification[i] = "NotinList"
}
}
答案 0 :(得分:1)
可能的解决方案:
i <- rowSums(sapply(StateFunding, function(p) agrepl(p, mydf$Area))) > 0
mydf$Classification <- c('NotInList','InList')[1 + i]
给出:
> mydf Area Classification 1 Cleveland, Ohio NotInList 2 Manhattan, New York InList 3 Lower Nevada NotInList
您也可以使用ifelse
:
ifelse(rowSums(sapply(StateFunding, function(p) agrepl(p, mydf$Area))) > 0, 'InList', 'NotInList')
使用过的数据:
mydf <- structure(list(Area = c("Cleveland, Ohio", "Manhattan, New York", "Lower Nevada")),
.Names = "Area", class = "data.frame", row.names = c(NA, -3L))
答案 1 :(得分:0)
您可以使用函数grepl
来检查一个字符串是否是另一个字符串的子字符串。例如:
for (i in 1:length(Data$Area))
{
if(sum(sapply(StateFunding,grepl,x=Data$Area[i]))>0){
Data$Classification[i] = "InList"
} else {
Data$Classification[i] = "NotinList"
}
}