加入两个数据帧 - 通过组合保持属性

时间:2016-06-03 13:26:46

标签: r join dataframe dplyr combinations

我有两个数据帧如下:

DF1 

Placement      SOURCE       Price    Rate
    A        Source 1        5        10
    B        Source 11       12       14
    B        Source 2        5        20
    B        Source 3        11       15
    B        Source 4        10       30
    C        Source 3        11       10
    D        Source 7        8        20
    D        Source 9        11       12
    E        Source 10       5        13
    E        Source 11       12        8

DF2


 X1              X2              X3            CLUSTER
 Source 1        Source 2        Source 3      3
 Source 1        Source 3        Source 4      3
 Source 7        Source 8        Source 9      4
 Source 10       Source 7        Source 11     4

我想将数据帧转换为下面的数据帧,它基本上采用DF1并通过Placement查找DF2中所有Source的组合并保持CLUSTER值:

 DF3

 Placement  Source           Price  Rate    DF2_Source  CLUSTER
 A          Source 1         5      10      Source 2    3
 A          Source 1         5      10      Source 3    3
 A          Source 1         5      10      Source 4    3
 B          Source 11        12     14      Source 7    4
 B          Source 11        12     14      Source 10   4
 B          Source 2         5      20      Source 1    3
 B          Source 2         5      20      Source 3    3
 B          Source 3         11     15      Source 1    3
 B          Source 3         11     15      Source 2    3
 B          Source 3         11     15      Source 4    3
 B          Source 4         10     30      Source 1    3
 B          Source 4         10     30      Source 3    3
 C          Source 3         11     10      Source 1    3
 C          Source 3         11     10      Source 2    3
 C          Source 3         11     10      Source 4    3
 D          Source 7         8      20      Source 8    4
 D          Source 7         8      20      Source 9    4
 D          Source 7         8      20      Source 10   4
 D          Source 7         8      20      Source 11   4
 D          Source 9         11     12      Source 7    4
 D          Source 9         11     12      Source 8    4
 E          Source 10        5      13      Source 7    4
 E          Source 10        5      13      Source 11   4
 E          Source 11        12     8       Source 7    4
 E          Source 11        12     8       Source 10   4

我认为它可能与组合功能有关,但不确定如何在"加入"数据框形式。

任何帮助都会很棒,谢谢!

1 个答案:

答案 0 :(得分:1)

您可以尝试以下方法:

数据:

DF1 <- structure(list(Placement = structure(c(1L, 2L, 2L, 2L, 2L, 3L, 
4L, 4L, 5L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), 
    SOURCE = c("Source 1", "Source 11", "Source 2", "Source 3", 
    "Source 4", "Source 3", "Source 7", "Source 9", "Source 10", 
    "Source 11"), Price = c(5L, 12L, 5L, 11L, 10L, 11L, 8L, 11L, 
    5L, 12L), Rate = c(10L, 14L, 20L, 15L, 30L, 10L, 20L, 12L, 
    13L, 8L)), .Names = c("Placement", "SOURCE", "Price", "Rate"
), row.names = c(NA, -10L), class = "data.frame")

DF2 <- structure(list(X1 = c("Source 1", "Source 1", "Source 7", "Source 10"
), X2 = c("Source 2", "Source 3", "Source 8", "Source 7"), X3 = c("Source 3", 
                                                                  "Source 4", "Source 9", "Source 11"), CLUSTER = c(3L, 3L, 4L, 
                                                                                                                    4L)), .Names = c("X1", "X2", "X3", "CLUSTER"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                       -4L))

代码:

library(dplyr, warn=F)
library(magrittr)
library(reshape2, warn=F)


 #Create a new dataset from DF2 by melting it for X1, X2 and X3

melted <- do.call('rbind', lapply(names(DF2)[1:3], function(x) {
  tempdf <- melt(DF2, id=c(x,"CLUSTER"),value.name = "DF2_SOURCE")[,c(x,"DF2_SOURCE","CLUSTER")]
  names(tempdf) <- c("SOURCE", "DF2_SOURCE", "CLUSTER")
  return(tempdf)
}))

#Remove duplicate rows from the newly generated dataset

melted2 <- melted[!duplicated.data.frame(melted),]

#Join the newly generated dataset to your DF1 dataframe

Combined_df <- dplyr::left_join(DF1, melted2, by=c("SOURCE"="SOURCE"))

Combined_df

       Placement    SOURCE Price Rate DF2_SOURCE CLUSTER
1          A  Source 1     5   10   Source 2       3
2          A  Source 1     5   10   Source 3       3
3          A  Source 1     5   10   Source 4       3
4          B Source 11    12   14  Source 10       4
5          B Source 11    12   14   Source 7       4
6          B  Source 2     5   20   Source 1       3
7          B  Source 2     5   20   Source 3       3
8          B  Source 3    11   15   Source 1       3
9          B  Source 3    11   15   Source 4       3
10         B  Source 3    11   15   Source 2       3
11         B  Source 4    10   30   Source 1       3
12         B  Source 4    10   30   Source 3       3
13         C  Source 3    11   10   Source 1       3
14         C  Source 3    11   10   Source 4       3
15         C  Source 3    11   10   Source 2       3
16         D  Source 7     8   20   Source 8       4
17         D  Source 7     8   20   Source 9       4
18         D  Source 7     8   20  Source 10       4
19         D  Source 7     8   20  Source 11       4
20         D  Source 9    11   12   Source 7       4
21         D  Source 9    11   12   Source 8       4
22         E Source 10     5   13   Source 7       4
23         E Source 10     5   13  Source 11       4
24         E Source 11    12    8  Source 10       4
25         E Source 11    12    8   Source 7       4

希望这有帮助。