我有一个df
像这样:
ID Country
55 Poland
55 Romania
55 France
98 Spain
98 Portugal
98 UK
65 Germany
67 Luxembourg
84 Greece
22 Estonia
22 Lithuania
在某些ID
处重复,因为它们属于同一组。我想做的是将所有paste
与同一Country
一起ID
,以得到这样的输出。
到目前为止,我尝试了
ifelse(df[duplicated(df$ID) | duplicated(df$ID, fromLast = TRUE),], paste('Countries', df$Country), NA)
,但这没有检索到预期的输出。
答案 0 :(得分:6)
使用<select>
<option value="0">Number 0</option>
<option value="1">Number 1</option>
<option value="2">Number 2</option>
<option value="3">Number 3</option>
<option value="4">Number 4</option>
</select>
<select>
<option value="5">Number 5</option>
<option value="6">Number 6</option>
<option value="7">Number 7</option>
<option value="8">Number 8</option>
<option value="9">Number 9</option>
</select>
<div id="output"></div>
data.table
答案 1 :(得分:5)
使用底数R,
replace(v1 <- with(df, ave(as.character(Country), ID, FUN = toString)), duplicated(v1), NA)
#[1] "Poland, Romania, France" NA NA "Spain, Portugal, UK" NA NA "Germany" "Luxembourg" "Greece" "Estonia, Lithuania"
#[11] NA
答案 2 :(得分:4)
使用dplyr
,一种方法是
library(dplyr)
df %>%
group_by(ID) %>%
mutate(new_name = paste0(Country,collapse = " + "),
new_name = replace(new_name, duplicated(new_name), NA))
# ID Country new_name
# <int> <fct> <chr>
# 1 55 Poland Poland + Romania + France
# 2 55 Romania NA
# 3 55 France NA
# 4 98 Spain Spain + Portugal + UK
# 5 98 Portugal NA
# 6 98 UK NA
# 7 65 Germany Germany
# 8 67 Luxembourg Luxembourg
# 9 84 Greece Greece
#10 22 Estonia Estonia + Lithuania
#11 22 Lithuania NA
但是,为了获得确切的预期输出,我们可能需要
df %>%
group_by(ID) %>%
mutate(new_name = if (n() > 1)
paste0("Countries ", paste0(Country,collapse = " + ")) else Country,
new_name = replace(new_name, duplicated(new_name), NA))
# ID Country new_name
# <int> <fct> <chr>
# 1 55 Poland Countries Poland + Romania + France
# 2 55 Romania NA
# 3 55 France NA
# 4 98 Spain Countries Spain + Portugal + UK
# 5 98 Portugal NA
# 6 98 UK NA
# 7 65 Germany Germany
# 8 67 Luxembourg Luxembourg
# 9 84 Greece Greece
#10 22 Estonia Countries Estonia + Lithuania
#11 22 Lithuania NA
答案 3 :(得分:3)
仅使用aggregate
,然后match
,这是第一次:
flat <- function(x) paste("Countries:", paste(x,collapse=", "))
tmp <- aggregate(Country ~ ID, data=dat, FUN=flat)
dat$Country <- NA
dat$Country[match(tmp$ID, dat$ID)] <- tmp$Country
# ID Country
#1 55 Countries: Poland, Romania, France
#2 55 <NA>
#3 55 <NA>
#4 98 Countries: Spain, Portugal, UK
#5 98 <NA>
#6 98 <NA>
#7 65 Countries: Germany
#8 67 Countries: Luxembourg
#9 84 Countries: Greece
#10 22 Countries: Estonia, Lithuania
#11 22 <NA>
答案 4 :(得分:1)
使用purrr
和dplyr
:
df %>%
nest(-ID) %>%
mutate(new_name = map_chr(data, ~ paste0(.x$Country, collapse = " + "))) %>%
unnest()
表格:
ID new_name Country
55 Poland + Romania + France Poland
55 Poland + Romania + France Romania
55 Poland + Romania + France France
98 Spain + Portugal + UK Spain
98 Spain + Portugal + UK Portugal
98 Spain + Portugal + UK UK
65 Germany Germany
67 Luxembourg Luxembourg
84 Greece Greece
22 Estonia + Lithuania Estonia
22 Estonia + Lithuania Lithuania