我有一个更大的现有数据框架。对于这个较小的例子,我想根据第&#34列首先用newstate(df2)替换一些变量(替换state(df1))。"我的问题是值返回为NA,因为只有一些名称在新数据帧(df2)中匹配。
现有数据框:
state = c("CA","WA","OR","AZ")
first = c("Jim","Mick","Paul","Ron")
df1 <- data.frame(first, state)
first state
1 Jim CA
2 Mick WA
3 Paul OR
4 Ron AZ
与现有数据框匹配的新数据框
state = c("CA","WA")
newstate = c("TX", "LA")
first =c("Jim","Mick")
df2 <- data.frame(first, state, newstate)
first state newstate
1 Jim CA TX
2 Mick WA LA
尝试使用匹配,但返回NA为&#34;状态&#34;匹配&#34;第一&#34;在原始数据帧中找不到df2的变量。
df1$state <- df2$newstate[match(df1$first, df2$first)]
first state
1 Jim TX
2 Mick LA
3 Paul <NA>
4 Ron <NA>
有没有办法忽略nomatch或nomatch按原样返回现有变量?这将是期望结果的例子:吉姆/米克的状态被更新,而保罗和罗恩的状态不会改变。
first state
1 Jim TX
2 Mick LA
3 Paul OR
4 Ron AZ
答案 0 :(得分:9)
这就是你想要的;除非你真的想要使用因子,否则在你的data.frame调用中使用stringsAsFactors = FALSE。注意在匹配调用中使用nomatch = 0.
> state = c("CA","WA","OR","AZ")
> first = c("Jim","Mick","Paul","Ron")
> df1 <- data.frame(first, state, stringsAsFactors = FALSE)
> state = c("CA","WA")
> newstate = c("TX", "LA")
> first =c("Jim","Mick")
> df2 <- data.frame(first, state, newstate, stringsAsFactors = FALSE)
> df1
first state
1 Jim CA
2 Mick WA
3 Paul OR
4 Ron AZ
> df2
first state newstate
1 Jim CA TX
2 Mick WA LA
>
> # create an index for the matches
> indx <- match(df1$first, df2$first, nomatch = 0)
> df1$state[indx != 0] <- df2$newstate[indx]
> df1
first state
1 Jim TX
2 Mick LA
3 Paul OR
4 Ron AZ
答案 1 :(得分:3)
我认为使用角色向量比使用因素更好。
> df1 <- data.frame(first, state,stringsAsFactors=FALSE)
> state = c("CA","WA")
> newstate = c("TX", "LA")
> first =c("Jim","Mick")
> df2 <- data.frame(first, state, newstate, stringsAsFactors=FALSE)
> df1[ match(df2$first, df1$first ), "state"] <- df2$newstate
> df1
first state
1 Jim TX
2 Mick LA
3 Paul OR
4 Ron AZ
答案 2 :(得分:2)
library(data.table)
DT1 <- as.data.table(df1)
DT2 <- as.data.table(df2)
setkey(DT1, first, state)
setkey(DT2, first, state)
DT1[DT2]
# first state newstate
# 1: Jim CA TX
# 2: Mick WA LA
请注意[.data.table
也有一个nomatch
参数,即:
DT2[DT1, nomatch=0]
# first state newstate
# 1: Jim CA TX
# 2: Mick WA LA
DT2[DT1, nomatch=NA]
# first state newstate
# 1: Jim CA TX
# 2: Mick WA LA
# 3: Paul OR NA
# 4: Ron AZ NA