在迭代的基础上,我正在生成一个如下所示的DataFrame:
RIC RICRoot ISIN ExpirationDate Exchange ... OpenInterest BlockVolume TotalVolume2 SecurityDescription SecurityLongDescription
closingDate ...
2018-03-15 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-16 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-19 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-20 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-21 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
我将其转换为多索引DF:
tmp.columns = pd.MultiIndex.from_arrays( [ [contract]*len(tmp.columns), tmp.columns.tolist() ] )
contract
仅是该数据的引用名称,您可以在下面的输出中以SPH0
的形式看到该名称:
SPH0 ...
RIC RICRoot ISIN ExpirationDate Exchange ... OpenInterest BlockVolume TotalVolume2 SecurityDescription SecurityLongDescription
closingDate ...
2018-03-15 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-16 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-19 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-20 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-21 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
我目前有一种非常低效的方式来合并这些DataFrame:
if df is None:
df = tmp;
else:
df = df.merge( tmp, how='outer', left_index=True, right_index=True)
这非常慢。我想将所有这些tempdf与它们各自的合同名称一起以关联的映射样式进行存储,并能够方便地以矢量化方式引用其数据。最佳解决方案是什么?水平/垂直增长重要吗?
答案 0 :(得分:0)
IIUC,您可以仅使用pd.concat()
并传递数据框列表和用于生成MultiIndex数据框的键。采取以下数据帧示例:
import pandas as pd
df1 = pd.DataFrame([
['2018-03-11', 'SPH0', 'SP', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-12', 'SPH0', 'SP', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-15', 'SPH0', 'SP', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-23', 'SPH0', 'SP', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-24', 'SPH0', 'SP', '2020-03-20', 'CME:Index and Options Market']],
columns=['closingDate', 'RIC', 'RICRoot', 'ExpirationDate', 'Exchange'])
df2 = pd.DataFrame([
['2018-03-15', 'HAB3', 'HA', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-16', 'HAB3', 'HA', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-22', 'HAB3', 'HA', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-24', 'HAB3', 'HA', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-20', 'HAB3', 'HA', '2020-03-20', 'CME:Index and Options Market']],
columns=['closingDate', 'RIC', 'RICRoot', 'ExpirationDate', 'Exchange'])
df3 = pd.DataFrame([
['2018-03-15', 'UHA6', 'UH', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-16', 'UHA6', 'UH', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-18', 'UHA6', 'UH', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-20', 'UHA6', 'UH', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-21', 'UHA6', 'UH', '2020-03-20', 'CME:Index and Options Market']],
columns=['closingDate', 'RIC', 'RICRoot', 'ExpirationDate', 'Exchange'])
现在致电pd.concat()
:
pd.concat([df1, df2, df3], keys=['SPH0','HAB3','UHA6'])
收益:
closingDate ... Exchange
SPH0 0 2018-03-11 ... CME:Index and Options Market
1 2018-03-12 ... CME:Index and Options Market
2 2018-03-15 ... CME:Index and Options Market
3 2018-03-23 ... CME:Index and Options Market
4 2018-03-24 ... CME:Index and Options Market
HAB3 0 2018-03-15 ... CME:Index and Options Market
1 2018-03-16 ... CME:Index and Options Market
2 2018-03-22 ... CME:Index and Options Market
3 2018-03-24 ... CME:Index and Options Market
4 2018-03-20 ... CME:Index and Options Market
UHA6 0 2018-03-15 ... CME:Index and Options Market
1 2018-03-16 ... CME:Index and Options Market
2 2018-03-18 ... CME:Index and Options Market
3 2018-03-20 ... CME:Index and Options Market
4 2018-03-21 ... CME:Index and Options Market
您还可以使用列表推导来创建要传递给pd.concat()
的数据帧列表,例如:
my_keys = ['SPH0','HAB3','UHA6']
dfs = [create_df(key) for key in my_keys]
pd.concat(dfs, keys=my_keys)
函数create_df()
返回一个数据帧。