将两列与NAs和重复值合并

时间:2018-03-02 15:27:43

标签: r merge dplyr split-apply-combine

数据框的子集:

             country1                 country2
                Japan                    Japan
          Netherlands                     <NA>
                 <NA>                     <NA>
               Brazil                   Brazil
   Russian Federation                     <NA>
                 <NA>                     <NA>
                 <NA> United States of America
              Germany                  Germany
              Ukraine                     <NA>
                Japan                    Japan
                 <NA>       Russian Federation
                 <NA> United States of America
               France                   France
          New Zealand              New Zealand
                Japan                     <NA>

我有两个字符向量,country1country2,我想将它们合并到一个新列中。我的数据集中没有观察到不同的国家/地区。但是,有些对具有重复值,我只希望显示一次。还有NAs的问题,我想在合并列中省略它,其中新列中的每个值只有国家字符串。一些观察结果在我的两个列中都有NA,我只想在新列中留下NA。我想知道解决这个问题的最佳方法是什么。

我在最高投票答案here中对该功能进行了一些小修改,并提出了类似的问题,将逗号分隔更改为无。

然而,这留下了未解决的重复问题:

             country1                 country2                                            merge
                Japan                    Japan                                       JapanJapan
          Netherlands                     <NA>                                      Netherlands
                 <NA>                     <NA>                                             <NA>
               Brazil                   Brazil                                     BrazilBrazil
   Russian Federation                     <NA>                               Russian Federation
                 <NA>                     <NA>                                             <NA>
                 <NA> United States of America                         United States of America
              Germany                  Germany                                   GermanyGermany
              Ukraine                     <NA>                                          Ukraine
                Japan                    Japan                                       JapanJapan
                 <NA>       Russian Federation                               Russian Federation
                 <NA> United States of America                         United States of America
               France                   France                                     FranceFrance
          New Zealand              New Zealand                           New ZealandNew Zealand
                Japan                     <NA>                                            Japan

3 个答案:

答案 0 :(得分:1)

由于您指定了dplyr,因此这里有一行内容:

df <- dplyr::mutate(df, merge = dplyr::if_else(is.na(country1), country2, country1))

数据

country1 <- c("Japan", "Netherlands", NA, "Brazil", "Russian Federation", NA, NA, "Germany", "Ukraine", "Japan", NA, NA, "France", "New Zealand", "Japan")
country2 <- c("Japan", NA, NA, "Brazil", NA, NA, "United States of America", "Germany", NA, "Japan", "Russian Federation", "United States of America", "France", "New Zealand", NA)
df <- data.frame(country1, country2, stringsAsFactors = F)

答案 1 :(得分:1)

既然你说你有角色向量,那么:

library(tidyverse)
coalesce(country1,country2)
 [1] "Japan"                    "Netherlands"              NA                        
 [4] "Brazil"                   "Russian Federation"       NA                        
 [7] "United States of America" "Germany"                  "Ukraine"                 
[10] "Japan"                    "Russian Federation"       "United States of America"
[13] "France"                   "New Zealand"              "Japan"   

如果是数据帧。只需coalesce(!!!df)

答案 2 :(得分:1)

您也可以使用第二列中的值替换第一列中的NA值:

df$country1[is.na(df$country1)] <- df$country2[is.na(df$country1)]