民间,
如果您将2个数据帧df1和df2,我想连接或合并。我的目标就像制作一个新数据框一样简单,其数据框的列是df1和df2的联合。
实施例
product=c("p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p3","p3","p3","p3","p3","p3","p3","p3","p4","p4","p4","p4","p4","p4","p4","p4")
skew=c("b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a")
version=c(0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2)
color=c("C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2")
price=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32)
df1 = data.frame(product, skew, version)
df2 = data.frame(product, skew, color, price)
我的愿望是得到如下结果。
我尝试过几个选项:
#option 1 with cbind
df <- cbind(df1,df2)
这将返回数据框重复列“product”和“skew”。
# Option 2, use data.frame
df <- data.frame(df1,df2)
这给了我很多我想要的东西,除了它有“产品”和“倾斜”的额外列。它们后缀为“.1”,因此没有重复。
# option 3, use merge which seems to be the way to go
df <- merge(df1,df2)
我认为我错过了一些合并的东西,因为这实际上已经在所有数据集中创建了一个联合,从提供的32个中总共得到128个观察值。我想这就是合并的方式。我已经运行了一个“?merge”并尝试了一些选项,但无法让它吐出我想要的东西。
所以我的问题是:
如上所述,从df1和df2中获取所需数据帧的最佳方法是什么?
请事先提供帮助! 里亚德。
product skew version color price
1 p1 b 0.1 C1 1
2 p1 b 0.1 C2 2
3 p1 b 0.2 C1 3
4 p1 b 0.2 C2 4
5 p1 a 0.1 C1 5
6 p1 a 0.1 C2 6
7 p1 a 0.2 C1 7
8 p1 a 0.2 C2 8
9 p2 b 0.1 C1 9
10 p2 b 0.1 C2 10
11 p2 b 0.2 C1 11
12 p2 b 0.2 C2 12
13 p2 a 0.1 C1 13
14 p2 a 0.1 C2 14
15 p2 a 0.2 C1 15
16 p2 a 0.2 C2 16
17 p3 b 0.1 C1 17
18 p3 b 0.1 C2 18
19 p3 b 0.2 C1 19
20 p3 b 0.2 C2 20
21 p3 a 0.1 C1 21
22 p3 a 0.1 C2 22
23 p3 a 0.2 C1 23
24 p3 a 0.2 C2 24
25 p4 b 0.1 C1 25
26 p4 b 0.1 C2 26
27 p4 b 0.2 C1 27
28 p4 b 0.2 C2 28
29 p4 a 0.1 C1 29
30 p4 a 0.1 C2 30
31 p4 a 0.2 C1 31
32 p4 a 0.2 C2 32
答案 0 :(得分:2)
您可以使用union()
,但会破坏列名称。
df_c <- union(df1, df2)
names(df_c) <- union(names(df1), names(df2))
df_c <- as.data.frame(df_c)
答案 1 :(得分:1)
merge()无法正常工作,因为您的“product”和“skew”列不是唯一标识符。组合多次出现。所以merge()计算每个可能的组合。您可以将第三列包含为id:
product=c("p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p3","p3","p3","p3","p3","p3","p3","p3","p4","p4","p4","p4","p4","p4","p4","p4")
skew=c("b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a")
version=c(0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2)
color=c("C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2")
price=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32)
id = 1:32
df1 = data.frame(product, skew, id, version)
df2 = data.frame(product, skew, id, color, price)
merge(df1, df2)
或者您手动合并data.frames:
cbind(df1, df2[, 3:4])