合并数据帧并将NONE替换为R中的值

时间:2017-11-16 22:07:18

标签: r

我有两个data.frames:

{"namespace" : "data",
 "type": "record",
 "name": "info",
 "doc": "A  list  of strings.",
 "fields": [
 			{"name": "DATE", "type": "string"},
 			{"name": "file", "type": "string"},		
 			{"name": "info", "type": "record", "fields": [
            													{"name": "START_DATE", "type": "string"},
            													{"name": "END_DATE", "type": "string"},
           													  {"name": "other", "type": "array", "items":"string"}]}
          ]
}

Data.frame1在CustID列中包含None。我需要用data.frame2中的CustID替换这些Nones,并确保列号FirstName,LastName,Address,DOB匹配来自两个数据集,因为某些名称可以匹配来自两个数据集但具有不同的地址和DOB - 这些不是同样的人。 我已经将这些列从因子转换为字符(不确定是否重要),并应用了match()函数但收到了0个匹配(我知道这是错误的) 这是我的代码:

data.frame1:
CustID  FirstName   LastName    Address         DOB         City    Phone
132    Mary         K               999 Drive   1/1/2011    Chicago 888-0000
133    Mona         J               222 Road    1/4/2002    NY      999-8888
188    Jack         S               122 Street  9/2/2009    Washin  777-9999
None    Helen       L               111 Rd      1/4/2010        
None    John        M               888 Lane    4/2/2002        
None    Sally       K               222 Street  2/3/2002        


data.frame2                     
CustID FirstName LastName Address   DOB         City
132    Mary      K        999 Drive 1/1/2011    Chicago 
133    Mona      J         222 Road 1/4/2002    NY  
188    Jack      S      122 Street  9/2/2009    Washington  
3338    Helen   L         111 Rd    1/4/2010        
882     John    M       888 Lane    4/2/2002        
976    Sally    K     222 Street    2/3/2002    

2 个答案:

答案 0 :(得分:1)

此代码应说明您必须如何继续:

  • 将data.frames合并为" fname"和" lname" (仅考虑缺少id的行)
  • 选择" id"合并的data.frame的列并将其复制到df1

实施例

df1 <- data.frame(id=c(NA, 12, NA, 13), 
    fname=c("A","B","Z","D"), 
    lname=c("1","2","3","4"))

df2 <- data.frame(id=c(1, 21, 33, 44), 
    fname=c("Z","A","A","Z")  , 
    lname=c("1","3","1","3"))

df1[!complete.cases(df1),1] <- merge(
    x=df1[!complete.cases(df1[,"id"]),], 
    y=df2, 
    by=c("fname", "lname"))[,"id.y"]

答案 1 :(得分:1)

以下是使用dplyr的一种方式。

  library(dplyr)

  df1 <- read.table(text = 
       "CustID  FirstName   LastName    Address         DOB         City    Phone
  132    Mary         K               999Drive   1/1/2011    Chicago 888-0000
  133    Mona         J               222Road    1/4/2002    NY      999-8888
  188    Jack         S               122Street  9/2/2009    Washin  777-9999
  None    Helen       L               111Rd      1/4/2010     ''     ''
  None    John        M               888Lane    4/2/2002       ''   ''
  None    Sally       K               222Street  2/3/2002        ''  ''"
  , header = T, stringsAsFactors = F)


  df2 <- read.table(text=                    
  "CustID FirstName LastName Address   DOB         City
  132    Mary      K        999Drive 1/1/2011    Chicago 
  133    Mona      J         222Road 1/4/2002    NY  
  188    Jack      S      122Street  9/2/2009    Washington  
  3338    Helen   L         111Rd    1/4/2010     ''   
  882     John    M       888Lane    4/2/2002       '' 
  976    Sally    K     222Street    2/3/2002    ''"
  , header = T, stringsAsFactors = F)

  df1 %>% left_join(df2 %>% select(-City), by = c('FirstName', 'LastName', 'DOB', 'Address')) %>% 
       mutate(CustID = ifelse(CustID.y == "None", CustID.x, CustID.y)) %>% select(-CustID.x, -CustID.y)



        FirstName LastName   Address      DOB    City    Phone CustID
1      Mary        K  999Drive 1/1/2011 Chicago 888-0000    132
2      Mona        J   222Road 1/4/2002      NY 999-8888    133
3      Jack        S 122Street 9/2/2009  Washin 777-9999    188
4     Helen        L     111Rd 1/4/2010                    3338
5      John        M   888Lane 4/2/2002                     882
6     Sally        K 222Street 2/3/2002                     976