Python Pandas - 发布追加/连接两个多索引的Dataframe

时间:2017-02-10 00:40:34

标签: python pandas dataframe

我正在尝试合并两个MultiIndex' ed数据帧。我的代码如下。正如您在输出中看到的那样,问题在于" DATE"索引重复,而我喜欢所有的值(OPEN_INT,PX_LAST)在同一个日期索引...任何想法?我试过追加和连续,但两个都给了我类似的结果。

            if df.empty:
                df = bbg_historicaldata(t, f, startDate, endDate)
                datesArray = list(df.index)
                tArray = [t for i in range(len(datesArray))]
                arrays = [tArray, datesArray]
                tuples = list(zip(*arrays))
                index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])                    
                df = pd.DataFrame({f : df[f].values}, index=index)

            else:
                temp = bbg_historicaldata(t,f,startDate,endDate)
                datesArray = list(temp.index)
                tArray = [t for i in range(len(datesArray))]
                arrays = [tArray, datesArray]
                tuples = list(zip(*arrays))
                index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])


                temp = pd.DataFrame({f : temp[f].values}, index=index)

                #df = df.append(temp, ignore_index = True)
                df = pd.concat([df, temp]).sortlevel()

结果:

                        OPEN_INT  PX_LAST
TICKER      DATE                          
EDH8 COMDTY 2017-02-01        NaN   98.365
            2017-02-01  1008044.0      NaN
            2017-02-02        NaN   98.370
            2017-02-02  1009994.0      NaN
            2017-02-03        NaN   98.360
            2017-02-03  1019181.0      NaN
            2017-02-06        NaN   98.405
            2017-02-06  1023863.0      NaN
            2017-02-07        NaN   98.410
            2017-02-07  1024609.0      NaN
            2017-02-08        NaN   98.435
            2017-02-08  1046258.0      NaN
            2017-02-09        NaN   98.395

基本上想要得到它所以没有NaN!

编辑:添加"轴= 1"到concat导致以下(我的错误,因为不包括第一位的额外输出)

                        PX_LAST   OPEN_INT  PX_LAST  OPEN_INT  PX_LAST  \
TICKER      DATE                                                         
EDH8 COMDTY 2017-02-01   98.365  1008044.0      NaN       NaN      NaN   
            2017-02-02   98.370  1009994.0      NaN       NaN      NaN   
            2017-02-03   98.360  1019181.0      NaN       NaN      NaN   
            2017-02-06   98.405  1023863.0      NaN       NaN      NaN   
            2017-02-07   98.410  1024609.0      NaN       NaN      NaN   
            2017-02-08   98.435  1046258.0      NaN       NaN      NaN   
            2017-02-09   98.395  1050291.0      NaN       NaN      NaN   
EDM8 COMDTY 2017-02-01      NaN        NaN   98.245  726739.0      NaN   
            2017-02-02      NaN        NaN   98.250  715081.0      NaN   
            2017-02-03      NaN        NaN   98.235  723936.0      NaN   
            2017-02-06      NaN        NaN   98.285  729324.0      NaN   
            2017-02-07      NaN        NaN   98.295  728673.0      NaN   
            2017-02-08      NaN        NaN   98.325  728520.0      NaN   
            2017-02-09      NaN        NaN   98.280  741840.0      NaN   
EDU8 COMDTY 2017-02-01      NaN        NaN      NaN       NaN   98.130   
            2017-02-02      NaN        NaN      NaN       NaN   98.135   
            2017-02-03      NaN        NaN      NaN       NaN   98.120   
            2017-02-06      NaN        NaN      NaN       NaN   98.180   
            2017-02-07      NaN        NaN      NaN       NaN   98.190   
            2017-02-08      NaN        NaN      NaN       NaN   98.225   
            2017-02-09      NaN        NaN      NaN       NaN   98.175  

谢谢!

2 个答案:

答案 0 :(得分:1)

目前尚不清楚输入的格式是什么。

我认为 OPEN_INT 如下所示:

import datetime
import pandas as pd


open_int = pd.DataFrame(
    [
        (datetime.date(2017, 2, 1), 1008044.0),
        (datetime.date(2017, 2, 2), 1009994.0),
        (datetime.date(2017, 2, 3), 1019181.0),
        (datetime.date(2017, 2, 6), 1023863.0),
        (datetime.date(2017, 2, 7), 1024609.0),
        (datetime.date(2017, 2, 8), 1046258.0),
    ],
    columns=['DATE', 'OPEN_INT']
)
open_int['TICKER'] = 'EDH8 COMDTY'
open_int.set_index(['TICKER', 'DATE'], inplace=True)

print(open_int)
#                          OPEN_INT
# TICKER      DATE
# EDH8 COMDTY 2017-02-01  1008044.0
#             2017-02-02  1009994.0
#             2017-02-03  1019181.0
#             2017-02-06  1023863.0
#             2017-02-07  1024609.0
#             2017-02-08  1046258.0

PX_LAST 如下所示:

px_last = pd.DataFrame(
    [
        (datetime.date(2017, 2, 1), 98.365),
        (datetime.date(2017, 2, 2), 98.370),
        (datetime.date(2017, 2, 3), 98.360),
        (datetime.date(2017, 2, 6), 98.405),
        (datetime.date(2017, 2, 7), 98.410),
        (datetime.date(2017, 2, 8), 98.435),
        (datetime.date(2017, 2, 9), 98.395),

    ],
    columns=['DATE', 'PX_LAST']
)
px_last['TICKER'] = 'EDH8 COMDTY'
px_last.set_index(['TICKER', 'DATE'], inplace=True)

print(px_last)
#                         PX_LAST
# TICKER      DATE
# EDH8 COMDTY 2017-02-01   98.365
#             2017-02-02   98.370
#             2017-02-03   98.360
#             2017-02-06   98.405
#             2017-02-07   98.410
#             2017-02-08   98.435
#             2017-02-09   98.395

然后你将它们连接起来并得到你想要的东西:

df = pd.concat([open_int, px_last], axis=1)
print(df)
#                          OPEN_INT  PX_LAST
# TICKER      DATE
# EDH8 COMDTY 2017-02-01  1008044.0   98.365
#             2017-02-02  1009994.0   98.370
#             2017-02-03  1019181.0   98.360
#             2017-02-06  1023863.0   98.405
#             2017-02-07  1024609.0   98.410
#             2017-02-08  1046258.0   98.435
#             2017-02-09        NaN   98.395

答案 1 :(得分:1)

您需要沿另一个轴连接:

pd.concat([df, temp], axis=1)

默认情况下,Pandas连接行并对齐列,这会导致您看到的结果。