如何在日期时间内建立索引的多级数据帧,如下所示:这是下载的Fin数据。 困难的部分是进入框架并访问特定内部级别的非相邻行,而没有明确指定外部级别日期,因为我有数千个这样的行..
ABC DEF GHI \
Date STATS
2012-07-19 00:00:00 NaN NaN NaN
investment 4 9 13
price 5 8 1
quantity 12 9 8
所以我搜索的两个公式可以概括为
X(today row) = quantity(prior row)*price(prior row)
or
X(today row) = quantity(prior row)*price(today)
难点在于如何使用numpy或panda为多级索引制定对这些行的访问,并且行不相邻。
最后我会以此结束:
ABC DEF GHI XN
Date STATS
2012-07-19 00:00:00 NaN NaN NaN
investment 4 9 13 X1
price 5 8 1
quantity 12 9 8
2012-07-18 00:00:00 NaN NaN NaN
investment 1 2 3 X2
price 2 3 4
quantity 18 6 7
X1= (18*2)+(6*3)+(7*4) (quantity_day_2 *price_day_2 data)
or for the other formula
X1= (18*5)+(6*8)+(7*1) (quantity_day_2 *price_day_1 data)
我可以使用groupby吗?
答案 0 :(得分:1)
您可以使用:
#add new datetime with data for better testing
print (df)
ABC DEF GHI
Date STATS
2012-07-19 NaN NaN NaN
investment 4.0 9.0 13.0
price 5.0 8.0 1.0
quantity 12.0 9.0 8.0
2012-07-18 NaN NaN NaN
investment 1.0 2.0 3.0
price 2.0 3.0 4.0
quantity 18.0 6.0 7.0
2012-07-17 NaN NaN NaN
investment 1.0 2.0 3.0
price 0.0 1.0 4.0
quantity 5.0 1.0 0.0
#lexsorted Multiindex
df.sort_index(inplace=True)
#select data and remove last level, because:
#1. need shift
#2. easier working
idx = pd.IndexSlice
p = df.loc[idx[:,'price'],:]
p.index = p.index.droplevel(-1)
q = df.loc[idx[:,'quantity'],:]
q.index = q.index.droplevel(-1)
print (p)
ABC DEF GHI
Date
2012-07-17 0.0 1.0 4.0
2012-07-18 2.0 3.0 4.0
2012-07-19 5.0 8.0 1.0
print (q)
ABC DEF GHI
Date
2012-07-17 5.0 1.0 0.0
2012-07-18 18.0 6.0 7.0
2012-07-19 12.0 9.0 8.0
print (p * q)
ABC DEF GHI
Date
2012-07-17 0.0 1.0 0.0
2012-07-18 36.0 18.0 28.0
2012-07-19 60.0 72.0 8.0
print ((p * q).sum(axis=1).to_frame().rename(columns={0:'col1'}))
col1
Date
2012-07-17 1.0
2012-07-18 82.0
2012-07-19 140.0
#shift row with -1, because lexsorted df
print (p.shift(-1, freq='D') * q)
ABC DEF GHI
Date
2012-07-16 NaN NaN NaN
2012-07-17 10.0 3.0 0.0
2012-07-18 90.0 48.0 7.0
2012-07-19 NaN NaN NaN
print ((p.shift(-1, freq='D') * q).sum(axis=1).to_frame().rename(columns={0:'col2'}))
col2
Date
2012-07-16 0.0
2012-07-17 13.0
2012-07-18 145.0
2012-07-19 0.0
答案 1 :(得分:1)
如果需要将输出添加到原始DataFrame
,那么它会更复杂:
print (df)
ABC DEF GHI
Date STATS
2012-07-19 NaN NaN NaN
investment 4.0 9.0 13.0
price 5.0 8.0 1.0
quantity 12.0 9.0 8.0
2012-07-18 NaN NaN NaN
investment 1.0 2.0 3.0
price 2.0 3.0 4.0
quantity 18.0 6.0 7.0
2012-07-17 NaN NaN NaN
investment 1.0 2.0 3.0
price 0.0 1.0 4.0
quantity 5.0 1.0 0.0
df.sort_index(inplace=True)
#rename value in level to investment - align data in final concat
idx = pd.IndexSlice
p = df.loc[idx[:,'price'],:].rename(index={'price':'investment'})
q = df.loc[idx[:,'quantity'],:].rename(index={'quantity':'investment'})
print (p)
ABC DEF GHI
Date STATS
2012-07-17 investment 0.0 1.0 4.0
2012-07-18 investment 2.0 3.0 4.0
2012-07-19 investment 5.0 8.0 1.0
print (q)
ABC DEF GHI
Date STATS
2012-07-17 investment 5.0 1.0 0.0
2012-07-18 investment 18.0 6.0 7.0
2012-07-19 investment 12.0 9.0 8.0
#multiple and concat to original df
print (p * q)
ABC DEF GHI
Date STATS
2012-07-17 investment 0.0 1.0 0.0
2012-07-18 investment 36.0 18.0 28.0
2012-07-19 investment 60.0 72.0 8.0
a = (p * q).sum(axis=1).rename('col1')
print (pd.concat([df, a], axis=1))
ABC DEF GHI col1
Date STATS
2012-07-17 NaN NaN NaN NaN
investment 1.0 2.0 3.0 1.0
price 0.0 1.0 4.0 NaN
quantity 5.0 1.0 0.0 NaN
2012-07-18 NaN NaN NaN NaN
investment 1.0 2.0 3.0 82.0
price 2.0 3.0 4.0 NaN
quantity 18.0 6.0 7.0 NaN
2012-07-19 NaN NaN NaN NaN
investment 4.0 9.0 13.0 140.0
price 5.0 8.0 1.0 NaN
quantity 12.0 9.0 8.0 NaN
#shift with Multiindex - not supported yet - first create Datatimeindex with unstack
#, then shift and last reshape to original by stack
#multiple and concat to original df
print (p.unstack().shift(-1, freq='D').stack() * q)
ABC DEF GHI
Date STATS
2012-07-16 investment NaN NaN NaN
2012-07-17 investment 10.0 3.0 0.0
2012-07-18 investment 90.0 48.0 7.0
2012-07-19 investment NaN NaN NaN
b = (p.unstack().shift(-1, freq='D').stack() * q).sum(axis=1).rename('col2')
print (pd.concat([df, b], axis=1))
ABC DEF GHI col2
Date STATS
2012-07-16 investment NaN NaN NaN 0.0
2012-07-17 NaN NaN NaN NaN
investment 1.0 2.0 3.0 13.0
price 0.0 1.0 4.0 NaN
quantity 5.0 1.0 0.0 NaN
2012-07-18 NaN NaN NaN NaN
investment 1.0 2.0 3.0 145.0
price 2.0 3.0 4.0 NaN
quantity 18.0 6.0 7.0 NaN
2012-07-19 NaN NaN NaN NaN
investment 4.0 9.0 13.0 0.0
price 5.0 8.0 1.0 NaN
quantity 12.0 9.0 8.0 NaN