我正在尝试合并两个MultiIndex' ed数据帧。我的代码如下。正如您在输出中看到的那样,问题在于" DATE"索引重复,而我喜欢所有的值(OPEN_INT,PX_LAST)在同一个日期索引...任何想法?我试过追加和连续,但两个都给了我类似的结果。
if df.empty:
df = bbg_historicaldata(t, f, startDate, endDate)
datesArray = list(df.index)
tArray = [t for i in range(len(datesArray))]
arrays = [tArray, datesArray]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])
df = pd.DataFrame({f : df[f].values}, index=index)
else:
temp = bbg_historicaldata(t,f,startDate,endDate)
datesArray = list(temp.index)
tArray = [t for i in range(len(datesArray))]
arrays = [tArray, datesArray]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])
temp = pd.DataFrame({f : temp[f].values}, index=index)
#df = df.append(temp, ignore_index = True)
df = pd.concat([df, temp]).sortlevel()
结果:
OPEN_INT PX_LAST
TICKER DATE
EDH8 COMDTY 2017-02-01 NaN 98.365
2017-02-01 1008044.0 NaN
2017-02-02 NaN 98.370
2017-02-02 1009994.0 NaN
2017-02-03 NaN 98.360
2017-02-03 1019181.0 NaN
2017-02-06 NaN 98.405
2017-02-06 1023863.0 NaN
2017-02-07 NaN 98.410
2017-02-07 1024609.0 NaN
2017-02-08 NaN 98.435
2017-02-08 1046258.0 NaN
2017-02-09 NaN 98.395
基本上想要得到它所以没有NaN!
编辑:添加"轴= 1"到concat导致以下(我的错误,因为不包括第一位的额外输出)
PX_LAST OPEN_INT PX_LAST OPEN_INT PX_LAST \
TICKER DATE
EDH8 COMDTY 2017-02-01 98.365 1008044.0 NaN NaN NaN
2017-02-02 98.370 1009994.0 NaN NaN NaN
2017-02-03 98.360 1019181.0 NaN NaN NaN
2017-02-06 98.405 1023863.0 NaN NaN NaN
2017-02-07 98.410 1024609.0 NaN NaN NaN
2017-02-08 98.435 1046258.0 NaN NaN NaN
2017-02-09 98.395 1050291.0 NaN NaN NaN
EDM8 COMDTY 2017-02-01 NaN NaN 98.245 726739.0 NaN
2017-02-02 NaN NaN 98.250 715081.0 NaN
2017-02-03 NaN NaN 98.235 723936.0 NaN
2017-02-06 NaN NaN 98.285 729324.0 NaN
2017-02-07 NaN NaN 98.295 728673.0 NaN
2017-02-08 NaN NaN 98.325 728520.0 NaN
2017-02-09 NaN NaN 98.280 741840.0 NaN
EDU8 COMDTY 2017-02-01 NaN NaN NaN NaN 98.130
2017-02-02 NaN NaN NaN NaN 98.135
2017-02-03 NaN NaN NaN NaN 98.120
2017-02-06 NaN NaN NaN NaN 98.180
2017-02-07 NaN NaN NaN NaN 98.190
2017-02-08 NaN NaN NaN NaN 98.225
2017-02-09 NaN NaN NaN NaN 98.175
谢谢!
答案 0 :(得分:1)
目前尚不清楚输入的格式是什么。
我认为 OPEN_INT 如下所示:
import datetime
import pandas as pd
open_int = pd.DataFrame(
[
(datetime.date(2017, 2, 1), 1008044.0),
(datetime.date(2017, 2, 2), 1009994.0),
(datetime.date(2017, 2, 3), 1019181.0),
(datetime.date(2017, 2, 6), 1023863.0),
(datetime.date(2017, 2, 7), 1024609.0),
(datetime.date(2017, 2, 8), 1046258.0),
],
columns=['DATE', 'OPEN_INT']
)
open_int['TICKER'] = 'EDH8 COMDTY'
open_int.set_index(['TICKER', 'DATE'], inplace=True)
print(open_int)
# OPEN_INT
# TICKER DATE
# EDH8 COMDTY 2017-02-01 1008044.0
# 2017-02-02 1009994.0
# 2017-02-03 1019181.0
# 2017-02-06 1023863.0
# 2017-02-07 1024609.0
# 2017-02-08 1046258.0
PX_LAST 如下所示:
px_last = pd.DataFrame(
[
(datetime.date(2017, 2, 1), 98.365),
(datetime.date(2017, 2, 2), 98.370),
(datetime.date(2017, 2, 3), 98.360),
(datetime.date(2017, 2, 6), 98.405),
(datetime.date(2017, 2, 7), 98.410),
(datetime.date(2017, 2, 8), 98.435),
(datetime.date(2017, 2, 9), 98.395),
],
columns=['DATE', 'PX_LAST']
)
px_last['TICKER'] = 'EDH8 COMDTY'
px_last.set_index(['TICKER', 'DATE'], inplace=True)
print(px_last)
# PX_LAST
# TICKER DATE
# EDH8 COMDTY 2017-02-01 98.365
# 2017-02-02 98.370
# 2017-02-03 98.360
# 2017-02-06 98.405
# 2017-02-07 98.410
# 2017-02-08 98.435
# 2017-02-09 98.395
然后你将它们连接起来并得到你想要的东西:
df = pd.concat([open_int, px_last], axis=1)
print(df)
# OPEN_INT PX_LAST
# TICKER DATE
# EDH8 COMDTY 2017-02-01 1008044.0 98.365
# 2017-02-02 1009994.0 98.370
# 2017-02-03 1019181.0 98.360
# 2017-02-06 1023863.0 98.405
# 2017-02-07 1024609.0 98.410
# 2017-02-08 1046258.0 98.435
# 2017-02-09 NaN 98.395
答案 1 :(得分:1)
您需要沿另一个轴连接:
pd.concat([df, temp], axis=1)
默认情况下,Pandas连接行并对齐列,这会导致您看到的结果。