如果来自一列的值与具有值数组的一列匹配,则对新列进行突变

时间:2019-04-13 04:37:11

标签: r

我有两栏。我想基于Country的值数组中是否存在Major teams的值,添加一个值为yes或no的新列。

Country中选择Major teams

tibble::tribble(
    ~COUNTRY,                                                                                    ~Major.teams,
  "Zimbabwe",             "['Zimbabwe,', 'Zimbabwe Under-13s,', 'Zimbabwe Under-18s,', 'Zimbabwe Under-19s']",
  "Zimbabwe",                                          "['Zimbabwe,', 'Mountaineers,', 'Zimbabwe Under-19s']",
  "Zimbabwe",                                                           "['Zimbabwe,', 'Zimbabwe Under-19s']",
  "Zimbabwe",                                    "['Zimbabwe,', 'Zimbabwe Under-17s,', 'Zimbabwe Under-19s']",
  "Zimbabwe",                                                                   "['Zimbabwe,', 'Shropshire']",
  "Zimbabwe",                        "['Zimbabwe,', 'Mountaineers,', 'Zimbabwe Cubs,', 'Zimbabwe Under-19s']",
  "Zimbabwe",                                                           "['Zimbabwe,', 'Zimbabwe Under-19s']",
  "Zimbabwe",                                    "['Zimbabwe Women,', 'Mountaineers,', 'Zimbabwe Under-19s']",
  "Zimbabwe",             "['Zimbabwe,', 'Zimbabwe Under-13s,', 'Zimbabwe Under-17s,', 'Zimbabwe Under-19s']",
  "Zimbabwe",                                                            "['Zimbabwe,', 'Natal,', 'Suffolk']",
  "Zimbabwe",                                                            "['Zimbabwe,', 'Western Transvaal']",
  "Zimbabwe",                                    "['Zimbabwe,', 'Zimbabwe Under-17s,', 'Zimbabwe Under-19s']",
  "Zimbabwe",                                                                    "['Zimbabwe,', 'Southerns']",
  "Zimbabwe",           "['Zimbabwe,', 'Mountaineers,', 'Zimbabwe A,', 'Zimbabwe Under-19s,', 'Zimbabwe XI']",
   "England",                                                     "['Zimbabwe-Rhodesia,', 'Kent,', 'Surrey']"
  )

1 个答案:

答案 0 :(得分:1)

由于Major.teams是单个字符串值,而不是值列表,因此我们需要对此进行一些清理。

我们从[|]|'列中删除了方括号和撇号(Major.teams),然后以逗号(,)分割字符串,并且仅在以下情况下返回Yes值的anyCountry列完全匹配。

c("No", "Yes")[mapply(function(x, y) any(x == y), 
     df$COUNTRY, strsplit(gsub("\\[|'|\\]", "", df$Major.teams), ",")) + 1] 

# [1] "Yes" "Yes" "Yes" "Yes" "Yes" "Yes" "Yes" "No"  "Yes" "Yes" "Yes" "Yes" 
#     "Yes" "Yes" "No"

如果我们想要匹配的值,可以使用ifelse

as.character(ifelse(mapply(function(x, y) any(x == y), 
df$COUNTRY, strsplit(gsub("\\[|'|\\]", "", df$Major.teams), ",")), df$COUNTRY, ""))

#[1] "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" 
#    ""   "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ""