我有两个data.frames:
{"namespace" : "data",
"type": "record",
"name": "info",
"doc": "A list of strings.",
"fields": [
{"name": "DATE", "type": "string"},
{"name": "file", "type": "string"},
{"name": "info", "type": "record", "fields": [
{"name": "START_DATE", "type": "string"},
{"name": "END_DATE", "type": "string"},
{"name": "other", "type": "array", "items":"string"}]}
]
}
Data.frame1在CustID列中包含None。我需要用data.frame2中的CustID替换这些Nones,并确保列号FirstName,LastName,Address,DOB匹配来自两个数据集,因为某些名称可以匹配来自两个数据集但具有不同的地址和DOB - 这些不是同样的人。 我已经将这些列从因子转换为字符(不确定是否重要),并应用了match()函数但收到了0个匹配(我知道这是错误的) 这是我的代码:
data.frame1:
CustID FirstName LastName Address DOB City Phone
132 Mary K 999 Drive 1/1/2011 Chicago 888-0000
133 Mona J 222 Road 1/4/2002 NY 999-8888
188 Jack S 122 Street 9/2/2009 Washin 777-9999
None Helen L 111 Rd 1/4/2010
None John M 888 Lane 4/2/2002
None Sally K 222 Street 2/3/2002
data.frame2
CustID FirstName LastName Address DOB City
132 Mary K 999 Drive 1/1/2011 Chicago
133 Mona J 222 Road 1/4/2002 NY
188 Jack S 122 Street 9/2/2009 Washington
3338 Helen L 111 Rd 1/4/2010
882 John M 888 Lane 4/2/2002
976 Sally K 222 Street 2/3/2002
答案 0 :(得分:1)
此代码应说明您必须如何继续:
实施例
df1 <- data.frame(id=c(NA, 12, NA, 13),
fname=c("A","B","Z","D"),
lname=c("1","2","3","4"))
df2 <- data.frame(id=c(1, 21, 33, 44),
fname=c("Z","A","A","Z") ,
lname=c("1","3","1","3"))
df1[!complete.cases(df1),1] <- merge(
x=df1[!complete.cases(df1[,"id"]),],
y=df2,
by=c("fname", "lname"))[,"id.y"]
答案 1 :(得分:1)
以下是使用dplyr
的一种方式。
library(dplyr)
df1 <- read.table(text =
"CustID FirstName LastName Address DOB City Phone
132 Mary K 999Drive 1/1/2011 Chicago 888-0000
133 Mona J 222Road 1/4/2002 NY 999-8888
188 Jack S 122Street 9/2/2009 Washin 777-9999
None Helen L 111Rd 1/4/2010 '' ''
None John M 888Lane 4/2/2002 '' ''
None Sally K 222Street 2/3/2002 '' ''"
, header = T, stringsAsFactors = F)
df2 <- read.table(text=
"CustID FirstName LastName Address DOB City
132 Mary K 999Drive 1/1/2011 Chicago
133 Mona J 222Road 1/4/2002 NY
188 Jack S 122Street 9/2/2009 Washington
3338 Helen L 111Rd 1/4/2010 ''
882 John M 888Lane 4/2/2002 ''
976 Sally K 222Street 2/3/2002 ''"
, header = T, stringsAsFactors = F)
df1 %>% left_join(df2 %>% select(-City), by = c('FirstName', 'LastName', 'DOB', 'Address')) %>%
mutate(CustID = ifelse(CustID.y == "None", CustID.x, CustID.y)) %>% select(-CustID.x, -CustID.y)
FirstName LastName Address DOB City Phone CustID
1 Mary K 999Drive 1/1/2011 Chicago 888-0000 132
2 Mona J 222Road 1/4/2002 NY 999-8888 133
3 Jack S 122Street 9/2/2009 Washin 777-9999 188
4 Helen L 111Rd 1/4/2010 3338
5 John M 888Lane 4/2/2002 882
6 Sally K 222Street 2/3/2002 976