我有两栏。我想基于Country
的值数组中是否存在Major teams
的值,添加一个值为yes或no的新列。
从Country
中选择Major teams
tibble::tribble(
~COUNTRY, ~Major.teams,
"Zimbabwe", "['Zimbabwe,', 'Zimbabwe Under-13s,', 'Zimbabwe Under-18s,', 'Zimbabwe Under-19s']",
"Zimbabwe", "['Zimbabwe,', 'Mountaineers,', 'Zimbabwe Under-19s']",
"Zimbabwe", "['Zimbabwe,', 'Zimbabwe Under-19s']",
"Zimbabwe", "['Zimbabwe,', 'Zimbabwe Under-17s,', 'Zimbabwe Under-19s']",
"Zimbabwe", "['Zimbabwe,', 'Shropshire']",
"Zimbabwe", "['Zimbabwe,', 'Mountaineers,', 'Zimbabwe Cubs,', 'Zimbabwe Under-19s']",
"Zimbabwe", "['Zimbabwe,', 'Zimbabwe Under-19s']",
"Zimbabwe", "['Zimbabwe Women,', 'Mountaineers,', 'Zimbabwe Under-19s']",
"Zimbabwe", "['Zimbabwe,', 'Zimbabwe Under-13s,', 'Zimbabwe Under-17s,', 'Zimbabwe Under-19s']",
"Zimbabwe", "['Zimbabwe,', 'Natal,', 'Suffolk']",
"Zimbabwe", "['Zimbabwe,', 'Western Transvaal']",
"Zimbabwe", "['Zimbabwe,', 'Zimbabwe Under-17s,', 'Zimbabwe Under-19s']",
"Zimbabwe", "['Zimbabwe,', 'Southerns']",
"Zimbabwe", "['Zimbabwe,', 'Mountaineers,', 'Zimbabwe A,', 'Zimbabwe Under-19s,', 'Zimbabwe XI']",
"England", "['Zimbabwe-Rhodesia,', 'Kent,', 'Surrey']"
)
答案 0 :(得分:1)
由于Major.teams
是单个字符串值,而不是值列表,因此我们需要对此进行一些清理。
我们从[|]|'
列中删除了方括号和撇号(Major.teams
),然后以逗号(,
)分割字符串,并且仅在以下情况下返回Yes
值的any
与Country
列完全匹配。
c("No", "Yes")[mapply(function(x, y) any(x == y),
df$COUNTRY, strsplit(gsub("\\[|'|\\]", "", df$Major.teams), ",")) + 1]
# [1] "Yes" "Yes" "Yes" "Yes" "Yes" "Yes" "Yes" "No" "Yes" "Yes" "Yes" "Yes"
# "Yes" "Yes" "No"
如果我们想要匹配的值,可以使用ifelse
as.character(ifelse(mapply(function(x, y) any(x == y),
df$COUNTRY, strsplit(gsub("\\[|'|\\]", "", df$Major.teams), ",")), df$COUNTRY, ""))
#[1] "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe"
# "" "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" "Zimbabwe" ""