我有两个df,我需要将它们合并。
df1看起来像这样:
COUNTRY YEAR TRADE
Spain 2016 276
Germany 2016 323
France 2016 392
Spain 2017 456
Germany 2017 564
France 2017 359
Spain 2015 767
Germany 2015 868
France 2015 969
df2看起来像这样:
COUNTRY GDP2016 GDP2017 GDP2015
Spain 1111 999 444
Germany 2222 888 555
France 3333 777 666
我可以使用两个GDP:
df3 <- merge(df1,df2, by = "COUNTRY")
df3 <- df3 %>% mutate(GDP = ifelse(YEAR == 2016, GDP2016, GDP2017))
df3 <- subset(df3, select = -c(GDP2016, GDP2017)
但是,在GDP为3的情况下,我必须使用其他方法。我想得到的是:
COUNTRY YEAR TRADE GDP
Spain 2016 276 1111
Germany 2016 323 2222
France 2016 392 3333
Spain 2017 456 999
Germany 2017 564 888
France 2017 359 777
Spain 2015 767 444
Germany 2015 868 555
France 2015 969 666
我将不胜感激!
答案 0 :(得分:0)
您必须melt
df2才能将其放入与df1相同的格式。然后,通过删除字符串的“ GDP”部分并仅保留年份,用gsub
创建一个新列YEAR。
df2_melt <- melt(df2, id.vars="COUNTRY")
df2_melt$YEAR <- gsub(pattern = "GDP",replacement = "",x = df2_melt$variable)
colnames(df2_melt)[colnames(df2_melt)=="value"] <- "GDP"
df3 <- merge(df1,df2_melt, by = c("COUNTRY","YEAR"))
COUNTRY YEAR TRADE variable GDP
1 France 2016 392 GDP2016 3333
2 France 2017 359 GDP2017 777
3 Germany 2016 323 GDP2016 2222
4 Germany 2017 564 GDP2017 888
5 Spain 2016 276 GDP2016 1111
6 Spain 2017 456 GDP2017 999
数据
df1 <- read.table(text="COUNTRY YEAR TRADE
Spain 2016 276
Germany 2016 323
France 2016 392
Spain 2017 456
Germany 2017 564
France 2017 359
Spain 2015 767
Germany 2015 868
France 2015 969",header=TRUE, stringsAsFactors=FALSE)
df2 <- read.table(text="COUNTRY GDP2016 GDP2017 GDP2018
Spain 1111 999 444
Germany 2222 888 555
France 3333 777 6669",header=TRUE, stringsAsFactors=FALSE)
答案 1 :(得分:0)
您可以这样做:
library(tidyverse)
df1 %>%
left_join(df2 %>%
gather(YEAR, GDP, -COUNTRY) %>%
mutate(YEAR = as.integer(sub("GDP", "", YEAR))),
by = c("COUNTRY", "YEAR"))
答案 2 :(得分:0)
问题在于df2不在易于连接的结构中,因此我将使用tidyr
更改结构:
library(dplyr)
library(tidyr)
df3 <-
df1 %>%
left_join(df2 %>%
gather(YEAR, GDP, -COUNTRY) %>%
mutate(YEAR = as.numeric(substr(YEAR, 4, 7))),
by = c("COUNTRY", "YEAR"))
请注意,由于年份不同,因此无法提供预期的答案。在df1中有2015年,但是在df2中有GDB2018的数据。
使用的数据:
df1 <- tibble::tribble(
~COUNTRY, ~YEAR, ~TRADE,
"Spain", 2016, 276,
"Germany", 2016, 323,
"France", 2016, 392,
"Spain", 2017, 456,
"Germany", 2017, 564,
"France", 2017, 359,
"Spain", 2015, 767,
"Germany", 2015, 868,
"France", 2015, 969
)
df2 <- tibble::tribble(
~COUNTRY, ~GDP2016, ~GDP2017, ~GDP2018,
"Spain", 1111, 999, 444,
"Germany", 2222, 888, 555,
"France", 3333, 777, 666
)
答案 3 :(得分:0)
data.table
样本数据
library( data.table )
df1 <- fread("COUNTRY YEAR TRADE
Spain 2016 276
Germany 2016 323
France 2016 392
Spain 2017 456
Germany 2017 564
France 2017 359
Spain 2015 767
Germany 2015 868
France 2015 969")
df2 <- fread("COUNTRY GDP2016 GDP2017 GDP2015
Spain 1111 999 444
Germany 2222 888 555
France 3333 777 666")
代码
#first melt and modify df2
df3 <- melt(df2, id.vars = "COUNTRY", variable.name = "YEAR")[, YEAR := as.numeric(gsub("[^0-9]", "", YEAR))]
#then join
df1[ df3, GDP := i.value, on = .(COUNTRY, YEAR) ][]
#or use as oneliner
df1[ melt(df2, id.vars = "COUNTRY", variable.name = "YEAR")[, YEAR := as.numeric(gsub("[^0-9]", "", YEAR))], GDP := i.value, on = .(COUNTRY, YEAR) ][]
输出
# COUNTRY YEAR TRADE GDP
# 1: Spain 2016 276 1111
# 2: Germany 2016 323 2222
# 3: France 2016 392 3333
# 4: Spain 2017 456 999
# 5: Germany 2017 564 888
# 6: France 2017 359 777
# 7: Spain 2015 767 444
# 8: Germany 2015 868 555
# 9: France 2015 969 666