R编程:组合两个数据帧

时间:2013-12-03 06:33:57

标签: r merge dataframe cbind

民间,

如果您将2个数据帧df1和df2,我想连接或合并。我的目标就像制作一个新数据框一样简单,其数据框的列是df1和df2的联合。

实施例

product=c("p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p3","p3","p3","p3","p3","p3","p3","p3","p4","p4","p4","p4","p4","p4","p4","p4")
skew=c("b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a")
version=c(0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2)
color=c("C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2")
price=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32)

df1 = data.frame(product, skew, version)
df2 = data.frame(product, skew, color, price)

我的愿望是得到如下结果。

我尝试过几个选项:

#option 1 with cbind
df <- cbind(df1,df2)

这将返回数据框重复列“product”和“skew”。

# Option 2, use data.frame
df <- data.frame(df1,df2)

这给了我很多我想要的东西,除了它有“产品”和“倾斜”的额外列。它们后缀为“.1”,因此没有重复。

# option 3, use merge which seems to be the way to go
df <- merge(df1,df2) 

我认为我错过了一些合并的东西,因为这实际上已经在所有数据集中创建了一个联合,从提供的32个中总共得到128个观察值。我想这就是合并的方式。我已经运行了一个“?merge”并尝试了一些选项,但无法让它吐出我想要的东西。

所以我的问题是:

如上所述,从df1和df2中获取所需数据帧的最佳方法是什么?

请事先提供帮助! 里亚德。

     product skew  version color price
1       p1    b     0.1    C1     1
2       p1    b     0.1    C2     2
3       p1    b     0.2    C1     3
4       p1    b     0.2    C2     4
5       p1    a     0.1    C1     5
6       p1    a     0.1    C2     6
7       p1    a     0.2    C1     7
8       p1    a     0.2    C2     8
9       p2    b     0.1    C1     9
10      p2    b     0.1    C2    10
11      p2    b     0.2    C1    11
12      p2    b     0.2    C2    12
13      p2    a     0.1    C1    13
14      p2    a     0.1    C2    14
15      p2    a     0.2    C1    15
16      p2    a     0.2    C2    16
17      p3    b     0.1    C1    17
18      p3    b     0.1    C2    18
19      p3    b     0.2    C1    19
20      p3    b     0.2    C2    20
21      p3    a     0.1    C1    21
22      p3    a     0.1    C2    22
23      p3    a     0.2    C1    23
24      p3    a     0.2    C2    24
25      p4    b     0.1    C1    25
26      p4    b     0.1    C2    26
27      p4    b     0.2    C1    27
28      p4    b     0.2    C2    28
29      p4    a     0.1    C1    29
30      p4    a     0.1    C2    30
31      p4    a     0.2    C1    31
32      p4    a     0.2    C2    32

2 个答案:

答案 0 :(得分:2)

您可以使用union(),但会破坏列名称。

df_c <- union(df1, df2)
names(df_c) <- union(names(df1), names(df2))
df_c <- as.data.frame(df_c)

答案 1 :(得分:1)

merge()无法正常工作,因为您的“product”和“skew”列不是唯一标识符。组合多次出现。所以merge()计算每个可能的组合。您可以将第三列包含为id:

product=c("p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p3","p3","p3","p3","p3","p3","p3","p3","p4","p4","p4","p4","p4","p4","p4","p4")
skew=c("b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a")
version=c(0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2)
color=c("C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2")
price=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32)
id = 1:32

df1 = data.frame(product, skew, id, version)
df2 = data.frame(product, skew, id, color, price)
merge(df1, df2)

或者您手动合并data.frames:

cbind(df1, df2[, 3:4])