我有2个像这样的数据框
TEAM <- c("PE","PE","MPI","TDT","HPT","ATD")
CODE <- c(NA,"F","A","H","G","D")
df1 <- data.frame(TEAM,CODE)
CODE <- c(NA,"F100","A234","D664","H435","G123","A666","D345","G324",NA)
TEAM <- c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)
df2 <- data.frame(CODE,TEAM)
我试图通过将df1中代码列中的第一个字母与df2中的代码列匹配来更新df2中的TEAM
我想要的df2输出
CODE TEAM
1 NA PE
2 F100 PE
3 A234 MPI
4 D664 ATD
5 H435 TDT
6 G123 HPT
7 A666 MPI
8 D345 ATD
9 G324 HPT
10 NA PE
我正在尝试使用sqldf,但这不是正确的
library(sqldf)
df2 <- sqldf(c("update df2 set TEAM =
case
when CODE like '%F%' then 'PE'
when CODE like '%A%' then 'MPI'
when CODE like '%D%' then 'ATD'
when CODE like '%G%' then 'HPT'
when CODE like '%H%' then 'TDT'
else 'NA'
end"))
如果没有sqldf,有人可以帮助我提供一些实现方法吗?
答案 0 :(得分:2)
使用match
和substr
(都在基础R中):
df2$TEAM = df1$TEAM[match(substr(df2$CODE, 1, 1), df1$CODE)]
df2
# CODE TEAM
# 1 <NA> PE
# 2 F100 PE
# 3 A234 MPI
# 4 D664 ATD
# 5 H435 TDT
# 6 G123 HPT
# 7 A666 MPI
# 8 D345 ATD
# 9 G324 HPT
# 10 <NA> PE
这是一个单一案例的权宜之计 - 如果您经常这样做,我会鼓励您将第一个代码字母提取到自己的列CODE_1
中,然后定期执行merge
或加入。
答案 1 :(得分:0)
假设您正在寻找sqldf解决方案,请尝试以下方法:
sqldf("select CODE,
case
when CODE like 'F%' then 'PE'
when CODE like 'A%' then 'MPI'
when CODE like 'D%' then 'ATD'
when CODE like 'G%' then 'HPT'
when CODE like 'H%' then 'TDT'
else 'PE'
end TEAM from df2", method = "raw")
或者这个:
sqldf("select df2.CODE, coalesce(df1.TEAM, 'PE') TEAM
from df2
left join df1 on substr(df2.CODE, 1, 1) = df1.CODE")