我有一个国家/地区列表
countries <- c("MAL","CHL","URU","YPR","OMA","GUY","HON","SAL","CYP")
和两个包含两个国家/地区中所有可能的对偶的数据框
set.seed(28100)
df1 <- as.data.frame(t(combn(countries, 2)))
df1$year <- sample(1800:2000, 36)
df1$value1 <- sample(1:100, 36)
df2 <- as.data.frame(t(combn(rev(countries), 2)))
df2$year <- sample(1800:2000, 36)
df2$value2 <- sample(LETTERS, 36, replace = TRUE)
现在,我希望按国家/地区(by = c("V1","V2","year")
)合并两个数据框,而不必担心列出这对国家/地区的实际顺序。
因此,V1 == "SAL"
和V2=="CYP"
可以与V1 == "SAL"
和V2=="CYP"
或V2 == "SAL"
和V1=="CYP"
合并的观察结果。
答案 0 :(得分:0)
这使用由每个数据帧的V1 V2列组成的索引列。索引列包含V1和V2列之间连接的排序字。
# `strSort` was taken from http://stackoverflow.com/questions/5904797/how-to-sort-letters-in-a-string-in-r
strSort <- function(x)
sapply(lapply(strsplit(x, NULL), sort), paste, collapse="")
#df1
index=paste0(as.character(df1[,1]), as.character(df1[,2]))
df1$index=strSort(index)
#df2
index=paste0(as.character(df2[,1]), as.character(df2[,2]))
df2$index=strSort(index)
merge(df1,df2,by="index")
部分输出:
# index V1.x V2.x year.x value1 V1.y V2.y year.y value2
# 1 AALLMS MAL SAL 1883 35 SAL MAL 1971 Y
# 2 AALMMO MAL OMA 1915 75 OMA MAL 1816 A
# 3 AALMOS OMA SAL 1806 95 SAL OMA 1894 X
# 4 ACHLLM MAL CHL 1870 27 CHL MAL 1991 U
# 5 ACHLLS CHL SAL 1949 55 SAL CHL 1928 E
# 6 ACHLMO CHL OMA 1966 31 OMA CHL 1839 X
# 7 ACLMPY MAL CYP 1830 15 CYP MAL 1912 Y
# 8 ACLPSY SAL CYP 1881 60 CYP SAL 1995 M