在Python中合并具有稍微不同日期的日期时间索引

时间:2019-10-12 18:38:41

标签: python pandas datetime-format

我正在尝试合并具有不同日期时间索引的两个熊猫数据框。 DF1是XYZ公司的季度财务报表,DF2是XYZ股票的公开交易股票的每日收盘价

问题在于财务报告的发布日期并不总是与每日收盘价的发布日期匹配(大概是因为该报告是在周末发布的)。

我需要一种方法来模糊DF2中的日期,以便当我将它们与DF1合并时,合并会选择DF2中最接近的日期,而不是在合并中为收盘价留一个空白。

当前使用:

df1 = [['2007-12-30','$xxx,xxx'],
       ['2008-03-30','$xxx,xxx'],
       ['2008-06-28','$xxx,xxx'],
       ['2008-09-29','$xxx,xxx'],
       ['2008-12-31','$xxx,xxx']]

df2 = [['2007-12-30','$45'],
       ['2008-03-30','$40'],
       ['2008-06-27','$38'],
       ['2008-09-29','$46'],
       ['2008-12-30','$50']]

df3 = pd.merge(df1, df2, how='outer', on='date') 

退货:

df3 = [['2007-12-30','$xxx,xxx', '$45'],
       ['2008-03-30','$xxx,xxx', '$40'],
       ['2008-06-28','$xxx,xxx', 'NaN'],
       ['2008-09-29','$xxx,xxx', '$46'],
       ['2008-12-31','$xxx,xxx', 'Nan']]

想要退货:

df3 = [['2007-12-30','$xxx,xxx', '$45'],
       ['2008-03-30','$xxx,xxx', '$40'],
       ['2008-06-28','$xxx,xxx', '$38'],
       ['2008-09-29','$xxx,xxx', '$46'],
       ['2008-12-31','$xxx,xxx', '$50']]

解决方案:

df3 = pd.merge(df1, df2, how='outer', on='date')\ 
        .sort_index(ascending=False).fillna(method="ffill")

df3 = df3[df3.index.isin(df1.index)]

2 个答案:

答案 0 :(得分:0)

使用fillna(method="ffill")获取先前的值,然后仅保存df1中存在日期的行

df3 = pd.merge(df1, df2, how='outer', on='date').sort_values('date').fillna(method="ffill")
df3 = df3[df3['date'].isin(df1['date'])]
         date    xprice price
0  2007-12-30  $xxx,xxx   $45
1  2008-03-30  $xxx,xxx   $40
2  2008-06-28  $xxx,xxx   $38
3  2008-09-29  $xxx,xxx   $46
4  2008-12-31  $xxx,xxx   $50

答案 1 :(得分:0)

import pandas as pd

mylist1 = [['2007-12-30','$xxx,xxx'],
       ['2008-03-30','$xxx,xxx'],
       ['2008-06-28','$xxx,xxx'],
       ['2008-09-29','$xxx,xxx'],
       ['2008-12-31','$xxx,xxx']]


mylist2 = [['2007-12-30','$45'],
       ['2008-03-30','$40'],
       ['2008-06-27','$38'],
       ['2008-09-29','$46'],
       ['2008-12-30','$50']]

df1 = pd.DataFrame.from_records(mylist1,columns=['date', "value"])
df2 = pd.DataFrame.from_records(mylist2,columns=['date', "value"])
df3 = pd.merge(df1, df2, right_index=True, left_index=True)