用重复的名称填写缺失值?

时间:2019-11-22 21:43:04

标签: r

如果这是重复的话,我深表歉意。我已经浏览了很多答案,但没有找到真正能解决我要尝试的问题的答案。

我有一个数据集,其中包含重复的名称,但不一定具有分配的帐号。例如:

df <- data.frame(Name = c("Hilton", "Comcast", "Comcast", "Comcast", "Froyos", "Froyos", "BigFive"), 
                 Account = c("123", "456", NA, NA, "789", NA, "111"))
df
     Name Account
1  Hilton     123
2 Comcast     456
3 Comcast    <NA>
4 Comcast    <NA>
5  Froyos     789
6  Froyos    <NA>
7 BigFive     111

我想匹配名称以填写相关的帐号,所以我看起来像这样:

     Name Account
1  Hilton     123
2 Comcast     456
3 Comcast     456
4 Comcast     456
5  Froyos     789
6  Froyos     789
7 BigFive     111

确保所有类都相同,我尝试制作一个单独的列表并使用ifelse%in%,但未为该名称分配正确的值。我的代码如下:

library(dplyr)

df$Name <- as.character(df$Name)
df2$Name <- as.character(df2$Name)
df$Account <- as.numeric(as.character(df$Account))
df2$Account <- as.numeric(as.character(df2$Account))

df2 <- df %>% 
  filter(as.numeric(Account) > 0)

df3 <- within(df, {New = ifelse(df$Name %in% df2$Name,
                                          df2$Account, NA)})

我觉得这应该很简单,但是我很难知道如何表达问题,以便正确地做到。任何帮助或指示将不胜感激。

1 个答案:

答案 0 :(得分:1)

注意stringsAsFactors = F

df <- data.frame(Name = c("Hilton", "Comcast", "Comcast", "Comcast", "Froyos", "Froyos", "BigFive"), 
                 Account = c("123", "456", NA, NA, "789", NA, "111"), stringsAsFactors = F)



df %>% group_by(Name) %>% mutate(Account = max(Account, na.rm = T)) %>% ungroup()