如何从一列日期中合并熊猫系列

时间:2018-02-20 14:29:10

标签: pandas dataframe

我有两个系列:

        date     DEF
0  1/31/1986  0.0140
1  2/28/1986  0.0150
2  3/31/1986  0.0160
3  4/30/1986  0.0120
4  5/30/1986  0.0120
        date     PE
0  1/31/1900  12.71
1  2/28/1900  12.94
2  3/31/1900  13.04
3  4/30/1900  13.21
4  5/31/1900  12.58

我需要迭代这种性质的几个DataFrame并将它们全部合并到一个大的DataFrame中,其中只添加与日期对齐的值。到目前为止我的功能:

def get_combined_vars(start, end):
rows = pd.date_range(start=start, end=end, freq='BM')
df1 = pd.DataFrame(rows, columns=['date'])
for key in variables.keys():
    check = variables[key][0]
    if check == 1:
        df2 = pd.DataFrame(variables[key][1]())
        print(df2.head(5))
        pd.merge_asof(df1.assign(datekey=pd.to_datetime(df1['date'].dt.strftime('%m-%d') + '-1900')),
          df2, 
          right_on='date', 
          left_on='datekey',
          direction='nearest',
          suffixes=('_x',''))
        print(df1.head(10))
return df1

我似乎找不到基于列合并DataFrames的正确命令。

期望的输出:

        date     DEF       PE
0  1/31/1900  0.0140    12.71
1  2/28/1900  0.0150    12.94
2  3/31/1900  0.0160    13.04
3  4/30/1900  0.0120    13.21
4  5/31/1900  0.0120    12.58

Merge_asof问题:

runfile('H:/Market Timing/Files/market_timing.py', wdir='H:/Market Timing/Files')
        date    BY
0  1/31/1963  0.98
1  2/28/1963     1
2  3/29/1963  1.01
3  4/30/1963  1.01
4  5/31/1963  1.01
Traceback (most recent call last):
  File "C:\Developer\Anaconda\lib\site-packages\pandas\core\tools\datetimes.py", line 303, in _convert_listlike
    values, tz = tslib.datetime_to_datetime64(arg)
  File "pandas\_libs\tslib.pyx", line 1884, in pandas._libs.tslib.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Developer\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
    execfile(filename, namespace)
  File "C:\Developer\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)
  File "H:/Market Timing/Files/market_timing.py", line 88, in <module>
    print(get_combined_vars('1/31/1995', '1/31/2005').head(10))
  File "H:/Market Timing/Files/market_timing.py", line 43, in get_combined_vars
    pd.merge_asof(df1.assign(datekey=pd.to_datetime(df1['date'].dt.strftime('%m-%d') + '-1900')),
  File "C:\Developer\Anaconda\lib\site-packages\pandas\core\tools\datetimes.py", line 373, in to_datetime
    values = _convert_listlike(arg._values, True, format)
  File "C:\Developer\Anaconda\lib\site-packages\pandas\core\tools\datetimes.py", line 306, in _convert_listlike
    raise e
  File "C:\Developer\Anaconda\lib\site-packages\pandas\core\tools\datetimes.py", line 294, in _convert_listlike
    require_iso8601=require_iso8601
  File "pandas\_libs\tslib.pyx", line 2156, in pandas._libs.tslib.array_to_datetime
  File "pandas\_libs\tslib.pyx", line 2379, in pandas._libs.tslib.array_to_datetime
  File "pandas\_libs\tslib.pyx", line 2373, in pandas._libs.tslib.array_to_datetime
  File "pandas\_libs\tslibs\parsing.pyx", line 99, in pandas._libs.tslibs.parsing.parse_datetime_string
  File "C:\Developer\Anaconda\lib\site-packages\dateutil\parser.py", line 1182, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "C:\Developer\Anaconda\lib\site-packages\dateutil\parser.py", line 581, in parse
    ret = default.replace(**repl)
ValueError: day is out of range for month

我相信这两个DataFrame试图合并的第三次传递会遇到这个错误:ValueError:day超出了一个月的范围

是否可以添加缓冲区以解决此类数据的差异?

2 个答案:

答案 0 :(得分:2)

您可以使用pd.merge_asof,但是,首先您需要在常年获取日期。

pd.merge_asof(df1.assign(datekey=pd.to_datetime(df1['date'].dt.strftime('%m-%d') + '-1900')),
              df2, 
              right_on='date', 
              left_on='datekey',
              direction='nearest',
              suffixes=('_x',''))[['date','DEF','PE']]

输出:

        date    DEF     PE
0 1900-01-31  0.014  12.71
1 1900-02-28  0.015  12.94
2 1900-03-31  0.016  13.04
3 1900-04-30  0.012  13.21
4 1900-05-31  0.012  12.58

答案 1 :(得分:1)

您可以使用pandas.Merge(或DataFrame.join作为速记)来执行此操作:

import pandas as pd
pd.Merge(df1, df2, on="date")

...但正如斯科特波士顿在评论中提到的那样,数据并不一致,因此您无法获得预期的结果。