我的dataset
有一列GeogPreferences
,其中包含一行中的多个字符串。我有一个字符串region
的向量,我想用它来搜索此列。我正在创建一个新列geog
,如果GeogPreferences
包含geog
中的任何字符串,我想保留GeogPreferences
中的相同文字,否则我只需要替换"所有"
我的示例代码是:
myDf <- structure(list(GeogPreferences = structure(1:4, .Label = c("Asia, Central and East Europe, Europe, North America, West Europe, Australia, Belgium, Czech Republic, France, Germany, India, Italy, Luxembourg, Netherlands, Poland, Romania, Spain, UK, US",
"Europe, North America, West Europe, US", "Global, North America",
"Northeast, Southeast, West, US"), class = "factor")), .Names = "GeogPreferences", class = "data.frame", row.names = c(NA,
-4L))
region <- c("Northeast","Southeast","West","Midwest","Southwest")
myDf$geog <- ifelse((grepl(paste(region, collapse = "|"), myDf$GeogPreferences)),myDf$GeogPreferences, c("All"))
问题是grep
认为字符串像&#34;西欧&#34;列在region
列表中,因为&#34; West&#34;我得到以下输出
geog
1
2
All
4
我期待输出如下:
geog
All
All
All
Northeast, Southeast, West, US
有没有办法使用grep或任何其他函数来获得此输出?
答案 0 :(得分:1)
我们可以使用if_else
dplyr
library(dplyr)
myDf %>%
mutate(geog = if_else(grepl(paste(region, collapse=",|"),
GeogPreferences), as.character(GeogPreferences), "All"))