将新列添加到数据框中,并将重复的值粘贴在一起

时间:2019-07-09 09:38:16

标签: r dataframe

我有一个df像这样:

ID  Country
55  Poland
55  Romania
55  France
98  Spain
98  Portugal
98  UK
65  Germany
67  Luxembourg
84  Greece
22  Estonia
22  Lithuania

在某些ID处重复,因为它们属于同一组。我想做的是将所有paste与同一Country一起ID,以得到这样的输出。

enter image description here

到目前为止,我尝试了 ifelse(df[duplicated(df$ID) | duplicated(df$ID, fromLast = TRUE),], paste('Countries', df$Country), NA),但这没有检索到预期的输出。

5 个答案:

答案 0 :(得分:6)

使用<select> <option value="0">Number 0</option> <option value="1">Number 1</option> <option value="2">Number 2</option> <option value="3">Number 3</option> <option value="4">Number 4</option> </select> <select> <option value="5">Number 5</option> <option value="6">Number 6</option> <option value="7">Number 7</option> <option value="8">Number 8</option> <option value="9">Number 9</option> </select> <div id="output"></div>

data.table

答案 1 :(得分:5)

使用底数R,

replace(v1 <- with(df, ave(as.character(Country), ID, FUN = toString)), duplicated(v1), NA)

#[1] "Poland, Romania, France" NA      NA    "Spain, Portugal, UK"     NA        NA    "Germany"      "Luxembourg"              "Greece"                  "Estonia, Lithuania"     
#[11] NA 

答案 2 :(得分:4)

使用dplyr,一种方法是

library(dplyr)
df %>%
  group_by(ID) %>%
  mutate(new_name = paste0(Country,collapse = " + "), 
         new_name = replace(new_name, duplicated(new_name), NA))

#     ID Country    new_name                 
#   <int> <fct>      <chr>                    
# 1    55 Poland     Poland + Romania + France
# 2    55 Romania    NA                       
# 3    55 France     NA                       
# 4    98 Spain      Spain + Portugal + UK    
# 5    98 Portugal   NA                       
# 6    98 UK         NA                       
# 7    65 Germany    Germany                  
# 8    67 Luxembourg Luxembourg               
# 9    84 Greece     Greece                   
#10    22 Estonia    Estonia + Lithuania      
#11    22 Lithuania  NA                  

但是,为了获得确切的预期输出,我们可能需要

df %>%
   group_by(ID) %>%
   mutate(new_name = if (n() > 1) 
         paste0("Countries ", paste0(Country,collapse = " + ")) else Country,
         new_name = replace(new_name, duplicated(new_name), NA))



#     ID Country    new_name                           
#    <int> <fct>      <chr>                              
# 1    55 Poland     Countries Poland + Romania + France
# 2    55 Romania    NA                                 
# 3    55 France     NA                                 
# 4    98 Spain      Countries Spain + Portugal + UK    
# 5    98 Portugal   NA                                 
# 6    98 UK         NA                                 
# 7    65 Germany    Germany                            
# 8    67 Luxembourg Luxembourg                         
# 9    84 Greece     Greece                             
#10    22 Estonia    Countries Estonia + Lithuania      
#11    22 Lithuania  NA                              

答案 3 :(得分:3)

仅使用aggregate,然后match,这是第一次:

flat <- function(x) paste("Countries:", paste(x,collapse=", "))
tmp <- aggregate(Country ~ ID, data=dat, FUN=flat)
dat$Country <- NA
dat$Country[match(tmp$ID, dat$ID)] <- tmp$Country

#   ID                            Country
#1  55 Countries: Poland, Romania, France
#2  55                               <NA>
#3  55                               <NA>
#4  98     Countries: Spain, Portugal, UK
#5  98                               <NA>
#6  98                               <NA>
#7  65                 Countries: Germany
#8  67              Countries: Luxembourg
#9  84                  Countries: Greece
#10 22      Countries: Estonia, Lithuania
#11 22                               <NA>

答案 4 :(得分:1)

使用purrrdplyr

    df %>%
    nest(-ID) %>% 
    mutate(new_name = map_chr(data, ~ paste0(.x$Country, collapse = " + "))) %>% 
    unnest()

表格:

  ID new_name                  Country     
  55 Poland + Romania + France Poland    
  55 Poland + Romania + France Romania   
  55 Poland + Romania + France France    
  98 Spain + Portugal + UK     Spain     
  98 Spain + Portugal + UK     Portugal  
  98 Spain + Portugal + UK     UK        
  65 Germany                   Germany   
  67 Luxembourg                Luxembourg
  84 Greece                    Greece    
  22 Estonia + Lithuania       Estonia   
  22 Estonia + Lithuania       Lithuania