我有一个类似的数据(玩具数据):
import pandas as pd
import numpy as np
N=5
dfi = pd.DataFrame()
for i in range(5):
df = pd.DataFrame(index=pd.date_range("20100101", periods=N, freq='M'))
df['price'] = np.random.randint(0,N,size=(len(df)))
df['quantity'] = np.random.randint(0,N,size=(len(df)))
df['type'] = 'P'+str(i)
dfi = pd.concat([df, dfi], axis=0)
dfi
由此,我想计算每种类型的新价格,即:
new_price = (1+perf)*new_price(t-1)
with :
new_price(0)=price(0)
and
perf = price(t)/price(t-1) if abs(price(t)/price(t-1)-1)<s else 0
我尝试过:
dfi['prix_corr'] = (dfi
.sort_index()
.groupby('type').price
.apply(lambda x: x.pct_change() if x.pct_change().abs() <= 0.5 else 0)
)
但收到错误消息:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
``
I would like to correct in each group for outlier time series data.
Any suggestion ?
答案 0 :(得分:0)
根据您的输入,您可以尝试在lambda表达式中使用自定义函数,例如:
def compute_price_change(x):
mask = x.pct_change().abs() > 0.5
x = x.pct_change()
x[mask] = 0
return x
dfi['prix_corr'] = (dfi
.groupby('type').price
.apply(lambda x: compute_price_change(x))
)
输出:
price quantity type prix_corr
2010-01-31 3 0 P4 NaN
2010-02-28 3 2 P4 0.0
2010-03-31 0 2 P4 -0.5
2010-04-30 2 4 P4 0.5
2010-05-31 2 2 P4 0.0
2010-01-31 1 2 P3 NaN
2010-02-28 4 3 P3 0.0
2010-03-31 0 0 P3 0.0
2010-04-30 4 0 P3 0.0
2010-05-31 2 2 P3 0.0
. . . . .
. . . . .
. . . . .
由于.pct_change()
对于第一个条目返回了NaN
,因此您可能还希望以某种方式处理它。