我有两个数据帧。第一个,df1包含国家和年份。第二个df2包含我想要在第三列中包含在df1中的数据,这些数据基于df1各自值的行和列的匹配。
df1
country year
1 A 2008
2 B 2008
3 C 2009
4 F 2004
5 E 2006
df2
country 2004 2005 2006 2007 2008 2009
1 A 3,74972737 3,69814069 1,8119572 2,0058797 2,3728207 3,63424962
2 B 3,62151043 1,54726382 -3,799075 1,92867306 2,92279764 0,68044437
3 C 25,0489995 10,7724208 9,41065376 4,85433932 0,06592277 2,20000019
4 F 4,78583195 5,04811878 3,46842543 3,78590254 4,19162568 4,01936553
5 E 3,44897379 0,78317304 -2,2531746 2,74421327 1,79830266 0,23479692
6 F 5,98651552 4,89339392 2,31922692 2,11685013 2,96275035 4,81028341
7 G 5,65500512 7,29449815 2,96201437 5,37337313 6,62686519 6,45269876
8 H 7,05863621 6,01378976 5,04512479 5,57180227 6,46438388 6,52143508
9 I 7,67535068 3,63781612 -3,5861456 1,32402682 1,91501801 0,03094361
这就是我想要实现的目标:
country year gdp
1 A 2008 2.372821
2 B 2008 2.922798
3 C 2009 2.200000
4 F 2004 5.986516
5 E 2006 -2.253175
我确信这个问题有一个非常简单的答案。如何将df2的数据带到df1?
我尝试使用dplyr:mutate
来实现它:
library(dplyr)
df1 <- mutate(df1, gdp = {
df2[which(df2$country == country),
which(colnames(df2) == year)]})
但是,出现以下错误消息
Error in which(colnames(df2) == year) : object 'year' not found
答案 0 :(得分:3)
使用dplyr
和tidyr
的解决方案。关键是使用df2
将gather
转换为长格式。之后,我们可以与left_join
进行合并操作。如果您的数据框中的mutate
全部为,
,那么最后一次.
来电可能是不必要的。 df3
是最终输出。
library(dplyr)
library(tidyr)
df3 <- df1 %>%
left_join(df2 %>% gather(year, gdp, -country, convert = TRUE),
by = c("country", "year")) %>%
mutate(gdp = as.numeric(sub(",", "\\.", gdp)))
df3
# country year gdp
# 1 A 2008 2.372821
# 2 B 2008 2.922798
# 3 C 2009 2.200000
# 4 F 2004 4.785832
# 5 F 2004 5.986516
# 6 E 2006 -2.253175
数据强>
df1 <- read.table(text = "country year
1 A 2008
2 B 2008
3 C 2009
4 F 2004
5 E 2006",
header = TRUE, stringsAsFactors = FALSE)
df2 <- read.table(text = " country 2004 2005 2006 2007 2008 2009
1 A 3,74972737 3,69814069 1,8119572 2,0058797 2,3728207 3,63424962
2 B 3,62151043 1,54726382 -3,799075 1,92867306 2,92279764 0,68044437
3 C 25,0489995 10,7724208 9,41065376 4,85433932 0,06592277 2,20000019
4 F 4,78583195 5,04811878 3,46842543 3,78590254 4,19162568 4,01936553
5 E 3,44897379 0,78317304 -2,2531746 2,74421327 1,79830266 0,23479692
6 F 5,98651552 4,89339392 2,31922692 2,11685013 2,96275035 4,81028341
7 G 5,65500512 7,29449815 2,96201437 5,37337313 6,62686519 6,45269876
8 H 7,05863621 6,01378976 5,04512479 5,57180227 6,46438388 6,52143508
9 I 7,67535068 3,63781612 -3,5861456 1,32402682 1,91501801 0,03094361",
header = TRUE, stringsAsFactors = FALSE)
names(df2) <- c("country", 2004:2009)