连接两个具有重复值的数据框

时间:2018-07-30 06:33:14

标签: pandas

具有两个数据框:

df1 =

    Col Date        Days
0   A   20180830    30
1   A   20180927    58
2   A   20181025    86
3   B   20180830    30
4   B   20180927    58
5   B   20181025    86
6   C   20180802    2
7   C   20180809    9
8   C   20180816    16
9   C   20180823    23

df2 =

    Col Lot     Pct
13  A   4000    16.19
184 B   600     7.51
206 C   250     5.00
...

如何制作单个数据框:

df =

    Col Date        Days    Lot     Pct
0   A   20180830    30      4000    16.19
1   A   20180927    58      4000    16.19
2   A   20181025    86      4000    16.19
3   B   20180830    30      600     7.51

df1.join(df2,on ='Col')给出KeyError:'Col'#检查重复项

1 个答案:

答案 0 :(得分:2)

您可以改为使用pd.merge联接到Pandas中的数据框。

以下示例:

import pandas as pd

data_1 = {'Col': ['A', 'A', 'B', 'B', 'C', 'C'], 'Date': [20180830, 20180927, 20180830, 20180927, 20180809, 20180816], 'Days':[1,2,3,4,5,6]}

df1 = pd.DataFrame(data=data_1)

df1

df =

   Col  Date    Days
0   A   20180830    1
1   A   20180927    2
2   B   20180830    3
3   B   20180927    4
4   C   20180809    5
5   C   20180816    6

创建df2

data_2 = {'Col': ['A', 'B', 'C'], 'Lot': [4000, 600, 250], 'PCT': [16.19, 7.51, 5.00]}

df2 = pd.DataFrame(data=data_2)

df2 =

    Col Lot     PCT
0   A   4000    16.19
1   B   600      7.51
2   C   250      5.00

您可以使用以下方法将上述两个数据框合并在一起:

pd.merge(df1, df2, on="Col")

输出将是:

    Col Date       Days Lot     PCT
0   A   20180830    1   4000    16.19
1   A   20180927    2   4000    16.19
2   B   20180830    3   600      7.51
3   B   20180927    4   600      7.51
4   C   20180809    5   250      5.00
5   C   20180816    6   250      5.00