我有两个不同长度和宽度的数据框。两者都包含多年来网站上的面板数据,每个网站都有唯一的ID代码。但是,这些唯一的ID代码在数据帧之间的某些站点被更改。例如:
Year <- c(2006,2006,2006,2006)
Name <- as.character(c("A","B","C","D.B"))
Qtr.2 <- as.numeric(c(14,32,62,40))
Code <- as.character(c(123,456,789,101))
DF1 <- data.frame(Year,Name,Qtr.2,Code,stringsAsFactors = FALSE)
Year2 <- c(2007,2007,2007,2007,2007,2007)
Name2 <- as.character(c("A","B","C","E","D.B","D.A"))
Qtr.3 <- as.numeric(c(14,32,62,11,40,20))
Code2 <- as.character(c("W33","456","789","121","W133","W111"))
Type <- as.character(c("Blue","Red","Red","Green","Blue","Red"))
DF2 <- data.frame(Year2,Name2,Qtr.3,Code2,Type,stringsAsFactors = FALSE)
> DF1
Year Name Qtr.2 Code
1 2006 A 14 123
2 2006 B 32 456
3 2006 C 62 789
4 2006 D.B 40 101
> DF2
Year2 Name2 Qtr.3 Code2 Type
1 2007 A 14 W33 Blue
2 2007 B 32 456 Red
3 2007 C 62 789 Red
4 2007 E 11 121 Green
5 2007 D.B 40 W133 Blue
6 2007 D.A 20 W111 Red
此处,站点“A&#; s”代码已从DF1中的“123”更改为DF2中的“W33”。 我无法以编程方式查找和转换更改的ID代码以匹配其先前的ID代码。换句话说,我想匹配DF1到DF2的名称,并替换&#34; Code2&#34;在DF2中使用&#34; Code&#34;从DF1发现匹配的名称。到目前为止,我的方法涉及一个相当复杂的填充和循环过程。但是,我觉得这必须是一个半规则的争论问题,必须有一个更简单的方法。
理想情况下,我的第二个DF看起来如下:
Year2_fixed <- c(2007,2007,2007,2007,2007,2007)
Name2_fixed <- as.character(c("A","B","C","E","D.B","D.A"))
Qtr.3_fixed <- as.numeric(c(14,32,62,11,40,20))
Code2_fixed <- as.character(c("123","456","789","121","101","W111"))
Type <- as.character(c("Blue","Red","Red","Green","Blue","Red"))
DF2_fixed <-data.frame(Year2_fixed,Name2_fixed,Qtr.3_fixed,Code2_fixed,Type,stringsAsFactors = FALSE)
> DF2_fixed
Year2_fixed Name2_fixed Qtr.3_fixed Code2_fixed Type
1 2007 A 14 123 Blue
2 2007 B 32 456 Red
3 2007 C 62 789 Red
4 2007 E 11 121 Green
5 2007 D.B 40 101 Blue
6 2007 D.A 20 W111 Red
我已经做了一些寻找,但我还没有找到关于操作系统的明确答案来解决这个问题。我有可能在搜索中没有明确地提出这个问题。如果它在那里请指出,或者如果我能澄清我的问题,请告诉我。
最后几点:我希望能够通过代码执行inner_join,保留两组中出现的观察结果。我提供了一个玩具示例,但是,通常情况下,真正的问题是太大而无法手动检查这些名称。
修改 正如其他人所指出的,已添加stringAsFactors = FALSE以防止错误。
答案 0 :(得分:1)
尝试使用.attr("stroke","black');
命令:
match
答案 1 :(得分:1)
解决方案是使用dplyr::coalesce
和left_join
来获得所需的结果。
library(dplyr)
DF2 %>% left_join(select(DF1, Name, Code), by=c("Name2" = "Name")) %>%
mutate(Code2 = coalesce(Code, Code2)) %>%
select(-Code)
# Year2 Name2 Qtr.3 Code2 Type
# 1 2007 A 14 123 Blue
# 2 2007 B 32 456 Red
# 3 2007 C 62 789 Red
# 4 2007 E 11 121 Green
# 5 2007 D.B 40 101 Blue
# 6 2007 D.A 20 W111 Red
注意:OP的代码中添加了 stringsAsFactors = FALSE
来创建data.frames,否则会生成不必要的警告。
数据:强>
Year <- c(2006,2006,2006,2006)
Name <- as.character(c("A","B","C","D.B"))
Qtr.2 <- as.numeric(c(14,32,62,40))
Code <- as.character(c(123,456,789,101))
DF1 <- data.frame(Year,Name,Qtr.2,Code, stringsAsFactors = FALSE)
Year2 <- c(2007,2007,2007,2007,2007,2007)
Name2 <- as.character(c("A","B","C","E","D.B","D.A"))
Qtr.3 <- as.numeric(c(14,32,62,11,40,20))
Code2 <- as.character(c("W33","456","789","121","W133","W111"))
Type <- as.character(c("Blue","Red","Red","Green","Blue","Red"))
DF2 <- data.frame(Year2,Name2,Qtr.3,Code2,Type, stringsAsFactors = FALSE)