R:合并数据帧,其中多列的索引是无序集

时间:2015-09-16 23:07:52

标签: r merge dataframe

我有一个国家/地区列表

countries <- c("MAL","CHL","URU","YPR","OMA","GUY","HON","SAL","CYP")

和两个包含两个国家/地区中所有可能的对偶的数据框

set.seed(28100)
df1 <- as.data.frame(t(combn(countries, 2)))
df1$year <- sample(1800:2000, 36)
df1$value1 <- sample(1:100, 36)

df2 <- as.data.frame(t(combn(rev(countries), 2)))
df2$year <- sample(1800:2000, 36)
df2$value2 <- sample(LETTERS, 36, replace = TRUE)

现在,我希望按国家/地区(by = c("V1","V2","year"))合并两个数据框,而不必担心列出这对国家/地区的实际顺序。 因此,V1 == "SAL"V2=="CYP"可以与V1 == "SAL"V2=="CYP"V2 == "SAL"V1=="CYP"合并的观察结果。

1 个答案:

答案 0 :(得分:0)

这使用由每个数据帧的V1 V2列组成的索引列。索引列包含V1和V2列之间连接的排序字。

# `strSort` was taken from http://stackoverflow.com/questions/5904797/how-to-sort-letters-in-a-string-in-r
strSort <- function(x)
  sapply(lapply(strsplit(x, NULL), sort), paste, collapse="")

#df1
index=paste0(as.character(df1[,1]), as.character(df1[,2]))
df1$index=strSort(index)

#df2
index=paste0(as.character(df2[,1]), as.character(df2[,2]))
df2$index=strSort(index)

merge(df1,df2,by="index")

部分输出:

#     index V1.x V2.x year.x value1 V1.y V2.y year.y value2
# 1  AALLMS  MAL  SAL   1883     35  SAL  MAL   1971      Y
# 2  AALMMO  MAL  OMA   1915     75  OMA  MAL   1816      A
# 3  AALMOS  OMA  SAL   1806     95  SAL  OMA   1894      X
# 4  ACHLLM  MAL  CHL   1870     27  CHL  MAL   1991      U
# 5  ACHLLS  CHL  SAL   1949     55  SAL  CHL   1928      E
# 6  ACHLMO  CHL  OMA   1966     31  OMA  CHL   1839      X
# 7  ACLMPY  MAL  CYP   1830     15  CYP  MAL   1912      Y
# 8  ACLPSY  SAL  CYP   1881     60  CYP  SAL   1995      M