使用grep查找R中现有列中的字符串的新列

时间:2017-01-13 16:48:26

标签: r grep strsplit

我的dataset有一列GeogPreferences,其中包含一行中的多个字符串。我有一个字符串region的向量,我想用它来搜索此列。我正在创建一个新列geog,如果GeogPreferences包含geog中的任何字符串,我想保留GeogPreferences中的相同文字,否则我只需要替换"所有"

的文字

我的示例代码是:

myDf <- structure(list(GeogPreferences = structure(1:4, .Label = c("Asia, Central and East Europe, Europe, North America, West Europe, Australia, Belgium, Czech Republic, France, Germany, India, Italy, Luxembourg, Netherlands, Poland, Romania, Spain, UK, US", 
"Europe, North America, West Europe, US", "Global, North America", 
"Northeast, Southeast, West, US"), class = "factor")), .Names = "GeogPreferences", class = "data.frame", row.names = c(NA, 
-4L))

region <- c("Northeast","Southeast","West","Midwest","Southwest")

myDf$geog <- ifelse((grepl(paste(region, collapse = "|"), myDf$GeogPreferences)),myDf$GeogPreferences, c("All"))

问题是grep认为字符串像&#34;西欧&#34;列在region列表中,因为&#34; West&#34;我得到以下输出

geog
1
2
All
4

我期待输出如下:

geog
All
All
All
Northeast, Southeast, West, US  

有没有办法使用grep或任何其他函数来获得此输出?

1 个答案:

答案 0 :(得分:1)

我们可以使用if_else

中的dplyr
library(dplyr)
 myDf %>% 
  mutate(geog = if_else(grepl(paste(region, collapse=",|"), 
              GeogPreferences), as.character(GeogPreferences), "All"))