数据框的子集:
country1 country2
Japan Japan
Netherlands <NA>
<NA> <NA>
Brazil Brazil
Russian Federation <NA>
<NA> <NA>
<NA> United States of America
Germany Germany
Ukraine <NA>
Japan Japan
<NA> Russian Federation
<NA> United States of America
France France
New Zealand New Zealand
Japan <NA>
我有两个字符向量,country1
和country2
,我想将它们合并到一个新列中。我的数据集中没有观察到不同的国家/地区。但是,有些对具有重复值,我只希望显示一次。还有NAs的问题,我想在合并列中省略它,其中新列中的每个值只有国家字符串。一些观察结果在我的两个列中都有NA,我只想在新列中留下NA。我想知道解决这个问题的最佳方法是什么。
我在最高投票答案here中对该功能进行了一些小修改,并提出了类似的问题,将逗号分隔更改为无。
然而,这留下了未解决的重复问题:
country1 country2 merge
Japan Japan JapanJapan
Netherlands <NA> Netherlands
<NA> <NA> <NA>
Brazil Brazil BrazilBrazil
Russian Federation <NA> Russian Federation
<NA> <NA> <NA>
<NA> United States of America United States of America
Germany Germany GermanyGermany
Ukraine <NA> Ukraine
Japan Japan JapanJapan
<NA> Russian Federation Russian Federation
<NA> United States of America United States of America
France France FranceFrance
New Zealand New Zealand New ZealandNew Zealand
Japan <NA> Japan
答案 0 :(得分:1)
由于您指定了dplyr
,因此这里有一行内容:
df <- dplyr::mutate(df, merge = dplyr::if_else(is.na(country1), country2, country1))
数据强>
country1 <- c("Japan", "Netherlands", NA, "Brazil", "Russian Federation", NA, NA, "Germany", "Ukraine", "Japan", NA, NA, "France", "New Zealand", "Japan")
country2 <- c("Japan", NA, NA, "Brazil", NA, NA, "United States of America", "Germany", NA, "Japan", "Russian Federation", "United States of America", "France", "New Zealand", NA)
df <- data.frame(country1, country2, stringsAsFactors = F)
答案 1 :(得分:1)
既然你说你有角色向量,那么:
library(tidyverse)
coalesce(country1,country2)
[1] "Japan" "Netherlands" NA
[4] "Brazil" "Russian Federation" NA
[7] "United States of America" "Germany" "Ukraine"
[10] "Japan" "Russian Federation" "United States of America"
[13] "France" "New Zealand" "Japan"
如果是数据帧。只需coalesce(!!!df)
答案 2 :(得分:1)
您也可以使用第二列中的值替换第一列中的NA
值:
df$country1[is.na(df$country1)] <- df$country2[is.na(df$country1)]