在列中合并具有相同值的2个数据帧

时间:2017-01-04 12:23:59

标签: python pandas dataframe merge

我有2个数据帧。一种是这种形式:

df1:
     date      revenue
0  2016-11-17   385.943800
1  2016-11-18  1074.160340
2  2016-11-19  2980.857860
3  2016-11-20  1919.723960
4  2016-11-21   884.279340
5  2016-11-22   869.071070
6  2016-11-23   760.289260
7  2016-11-24  2481.689270
8  2016-11-25  2745.990070
9  2016-11-26  2273.413250
10 2016-11-27  2630.414900

另一个是这种形式:

df2:

      CET    MaxTemp  MeanTemp MinTemp  MaxHumidity  MeanHumidity  MinHumidity
0  2016-11-17   11      9        7            100           85             63
1  2016-11-18   9       6        3             93           83             66
2  2016-11-19   8       6        4             93           87             76
3  2016-11-20   10      7        4             93           84             81
4  2016-11-21   14     10        7            100           89             77
5  2016-11-22   13     10        7             93           79             63
6  2016-11-23   11      8        5            100           91             82
7  2016-11-24   9       7        4             93           80             66
8  2016-11-25   7       4        1             87           74             57
9  2016-11-26   7       3       -1            100           88             61
10 2016-11-27  10       7        4            100           81             66   

两个数据帧都有更多行,行数每天都在增加。

我希望以某种方式合并这两个数据框,每当我们在df1['date']df2['CET']中看到相同的日期时,我们会向df2添加一个额外的列,这将具有收入值这个日期。所以我想创建这个:

df2:

      CET    MaxTemp  MeanTemp MinTemp  MaxHumidity  MeanHumidity  MinHumidity  revenue
0  2016-11-17   11      9        7            100           85             63   385.943800
1  2016-11-18   9       6        3             93           83             66  1074.160340
2  2016-11-19   8       6        4             93           87             76  2980.857860
3  2016-11-20   10      7        4             93           84             81  1919.723960
4  2016-11-21   14     10        7            100           89             77   884.279340
5  2016-11-22   13     10        7             93           79             63   869.071070
6  2016-11-23   11      8        5            100           91             82   760.289260
7  2016-11-24   9       7        4             93           80             66  2481.689270
8  2016-11-25   7       4        1             87           74             57  2745.990070
9  2016-11-26   7       3       -1            100           88             61  2273.413250
10 2016-11-27  10       7        4            100           81             66  2630.414900

有人可以帮我怎么做吗?

2 个答案:

答案 0 :(得分:3)

我认为您可以使用map

df2['revenue'] = df2.CET.map(df1.set_index('date')['revenue'])

此外,您可以将Series转换为dict,然后大df的速度会快一些:

df2['revenue'] = df2.CET.map(df1.set_index('date')['revenue'].to_dict())

print (df2)
           CET  MaxTemp  MeanTemp  MinTemp  MaxHumidity  MeanHumidity  \
0   2016-11-17       11         9        7          100            85   
1   2016-11-18        9         6        3           93            83   
2   2016-11-19        8         6        4           93            87   
3   2016-11-20       10         7        4           93            84   
4   2016-11-21       14        10        7          100            89   
5   2016-11-22       13        10        7           93            79   
6   2016-11-23       11         8        5          100            91   
7   2016-11-24        9         7        4           93            80   
8   2016-11-25        7         4        1           87            74   
9   2016-11-26        7         3       -1          100            88   
10  2016-11-27       10         7        4          100            81   

    MinHumidity     revenue  
0            63   385.94380  
1            66  1074.16034  
2            76  2980.85786  
3            81  1919.72396  
4            77   884.27934  
5            63   869.07107  
6            82   760.28926  
7            66  2481.68927  
8            57  2745.99007  
9            61  2273.41325  
10           66  2630.41490  

如果所有输出值均为NAN,则问题与dtypesCETdate不同{/ 1>}:

print (df1.date.dtypes)
object
print (df2.CET.dtype)
datetime64[ns]

解决方案是转换stringto_datetime

df1.date = pd.to_datetime(df1.date)

答案 1 :(得分:3)

仅当您在.map()date列中的值非常相同时,

CET解决方案才有效。

如果您的值略有不同,则可以使用pd.merge_asof()方法:

In [17]: pd.merge_asof(df1, df2, left_on='date', right_on='CET', tolerance=pd.Timedelta('2 hours'))
Out[17]:
         date     revenue        CET  MaxTemp  MeanTemp  MinTemp  MaxHumidity  MeanHumidity  MinHumidity
0  2016-11-17   385.94380 2016-11-17       11         9        7          100            85           63
1  2016-11-18  1074.16034 2016-11-18        9         6        3           93            83           66
2  2016-11-19  2980.85786 2016-11-19        8         6        4           93            87           76
3  2016-11-20  1919.72396 2016-11-20       10         7        4           93            84           81
4  2016-11-21   884.27934 2016-11-21       14        10        7          100            89           77
5  2016-11-22   869.07107 2016-11-22       13        10        7           93            79           63
6  2016-11-23   760.28926 2016-11-23       11         8        5          100            91           82
7  2016-11-24  2481.68927 2016-11-24        9         7        4           93            80           66
8  2016-11-25  2745.99007 2016-11-25        7         4        1           87            74           57
9  2016-11-26  2273.41325 2016-11-26        7         3       -1          100            88           61
10 2016-11-27  2630.41490 2016-11-27       10         7        4          100            81           66

注意: merge_asof()功能已添加到Pandas 0.19.0中(即旧版本不提供)

<强>演示:

In [191]: df2
Out[191]:
                   CET  MaxTemp  MeanTemp  MinTemp  MaxHumidity  MeanHumidity  MinHumidity
0  2016-11-17 01:39:00       11         9        7          100            85           63
1  2016-11-18 01:39:00        9         6        3           93            83           66
2  2016-11-19 01:39:00        8         6        4           93            87           76
3  2016-11-20 01:39:00       10         7        4           93            84           81
4  2016-11-21 01:39:00       14        10        7          100            89           77
5  2016-11-22 01:39:00       13        10        7           93            79           63
6  2016-11-23 01:39:00       11         8        5          100            91           82
7  2016-11-24 01:39:00        9         7        4           93            80           66
8  2016-11-25 01:39:00        7         4        1           87            74           57
9  2016-11-26 01:39:00        7         3       -1          100            88           61
10 2016-11-27 01:39:00       10         7        4          100            81           66

In [192]: df1
Out[192]:
         date     revenue
0  2016-11-17   385.94380
1  2016-11-18  1074.16034
2  2016-11-19  2980.85786
3  2016-11-20  1919.72396
4  2016-11-21   884.27934
5  2016-11-22   869.07107
6  2016-11-23   760.28926
7  2016-11-24  2481.68927
8  2016-11-25  2745.99007
9  2016-11-26  2273.41325
10 2016-11-27  2630.41490

In [193]:  pd.merge_asof(df2, df1, left_on='CET', right_on='date')
Out[193]:
                   CET  MaxTemp  MeanTemp  MinTemp  MaxHumidity  MeanHumidity  MinHumidity       date     revenue
0  2016-11-17 01:39:00       11         9        7          100            85           63 2016-11-17   385.94380
1  2016-11-18 01:39:00        9         6        3           93            83           66 2016-11-18  1074.16034
2  2016-11-19 01:39:00        8         6        4           93            87           76 2016-11-19  2980.85786
3  2016-11-20 01:39:00       10         7        4           93            84           81 2016-11-20  1919.72396
4  2016-11-21 01:39:00       14        10        7          100            89           77 2016-11-21   884.27934
5  2016-11-22 01:39:00       13        10        7           93            79           63 2016-11-22   869.07107
6  2016-11-23 01:39:00       11         8        5          100            91           82 2016-11-23   760.28926
7  2016-11-24 01:39:00        9         7        4           93            80           66 2016-11-24  2481.68927
8  2016-11-25 01:39:00        7         4        1           87            74           57 2016-11-25  2745.99007
9  2016-11-26 01:39:00        7         3       -1          100            88           61 2016-11-26  2273.41325
10 2016-11-27 01:39:00       10         7        4          100            81           66 2016-11-27  2630.41490

使用.map()方法:

In [194]: df2.CET.map(df1.set_index('date')['revenue'])
Out[194]:
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN
6    NaN
7    NaN
8    NaN
9    NaN
10   NaN
Name: CET, dtype: float64