我有两个数据帧如下:
DF1
Placement SOURCE Price Rate
A Source 1 5 10
B Source 11 12 14
B Source 2 5 20
B Source 3 11 15
B Source 4 10 30
C Source 3 11 10
D Source 7 8 20
D Source 9 11 12
E Source 10 5 13
E Source 11 12 8
DF2
X1 X2 X3 CLUSTER
Source 1 Source 2 Source 3 3
Source 1 Source 3 Source 4 3
Source 7 Source 8 Source 9 4
Source 10 Source 7 Source 11 4
我想将数据帧转换为下面的数据帧,它基本上采用DF1并通过Placement查找DF2中所有Source的组合并保持CLUSTER值:
DF3
Placement Source Price Rate DF2_Source CLUSTER
A Source 1 5 10 Source 2 3
A Source 1 5 10 Source 3 3
A Source 1 5 10 Source 4 3
B Source 11 12 14 Source 7 4
B Source 11 12 14 Source 10 4
B Source 2 5 20 Source 1 3
B Source 2 5 20 Source 3 3
B Source 3 11 15 Source 1 3
B Source 3 11 15 Source 2 3
B Source 3 11 15 Source 4 3
B Source 4 10 30 Source 1 3
B Source 4 10 30 Source 3 3
C Source 3 11 10 Source 1 3
C Source 3 11 10 Source 2 3
C Source 3 11 10 Source 4 3
D Source 7 8 20 Source 8 4
D Source 7 8 20 Source 9 4
D Source 7 8 20 Source 10 4
D Source 7 8 20 Source 11 4
D Source 9 11 12 Source 7 4
D Source 9 11 12 Source 8 4
E Source 10 5 13 Source 7 4
E Source 10 5 13 Source 11 4
E Source 11 12 8 Source 7 4
E Source 11 12 8 Source 10 4
我认为它可能与组合功能有关,但不确定如何在"加入"数据框形式。
任何帮助都会很棒,谢谢!
答案 0 :(得分:1)
您可以尝试以下方法:
DF1 <- structure(list(Placement = structure(c(1L, 2L, 2L, 2L, 2L, 3L,
4L, 4L, 5L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"),
SOURCE = c("Source 1", "Source 11", "Source 2", "Source 3",
"Source 4", "Source 3", "Source 7", "Source 9", "Source 10",
"Source 11"), Price = c(5L, 12L, 5L, 11L, 10L, 11L, 8L, 11L,
5L, 12L), Rate = c(10L, 14L, 20L, 15L, 30L, 10L, 20L, 12L,
13L, 8L)), .Names = c("Placement", "SOURCE", "Price", "Rate"
), row.names = c(NA, -10L), class = "data.frame")
DF2 <- structure(list(X1 = c("Source 1", "Source 1", "Source 7", "Source 10"
), X2 = c("Source 2", "Source 3", "Source 8", "Source 7"), X3 = c("Source 3",
"Source 4", "Source 9", "Source 11"), CLUSTER = c(3L, 3L, 4L,
4L)), .Names = c("X1", "X2", "X3", "CLUSTER"), class = "data.frame", row.names = c(NA,
-4L))
library(dplyr, warn=F)
library(magrittr)
library(reshape2, warn=F)
#Create a new dataset from DF2 by melting it for X1, X2 and X3
melted <- do.call('rbind', lapply(names(DF2)[1:3], function(x) {
tempdf <- melt(DF2, id=c(x,"CLUSTER"),value.name = "DF2_SOURCE")[,c(x,"DF2_SOURCE","CLUSTER")]
names(tempdf) <- c("SOURCE", "DF2_SOURCE", "CLUSTER")
return(tempdf)
}))
#Remove duplicate rows from the newly generated dataset
melted2 <- melted[!duplicated.data.frame(melted),]
#Join the newly generated dataset to your DF1 dataframe
Combined_df <- dplyr::left_join(DF1, melted2, by=c("SOURCE"="SOURCE"))
Combined_df
Placement SOURCE Price Rate DF2_SOURCE CLUSTER
1 A Source 1 5 10 Source 2 3
2 A Source 1 5 10 Source 3 3
3 A Source 1 5 10 Source 4 3
4 B Source 11 12 14 Source 10 4
5 B Source 11 12 14 Source 7 4
6 B Source 2 5 20 Source 1 3
7 B Source 2 5 20 Source 3 3
8 B Source 3 11 15 Source 1 3
9 B Source 3 11 15 Source 4 3
10 B Source 3 11 15 Source 2 3
11 B Source 4 10 30 Source 1 3
12 B Source 4 10 30 Source 3 3
13 C Source 3 11 10 Source 1 3
14 C Source 3 11 10 Source 4 3
15 C Source 3 11 10 Source 2 3
16 D Source 7 8 20 Source 8 4
17 D Source 7 8 20 Source 9 4
18 D Source 7 8 20 Source 10 4
19 D Source 7 8 20 Source 11 4
20 D Source 9 11 12 Source 7 4
21 D Source 9 11 12 Source 8 4
22 E Source 10 5 13 Source 7 4
23 E Source 10 5 13 Source 11 4
24 E Source 11 12 8 Source 10 4
25 E Source 11 12 8 Source 7 4
希望这有帮助。