嗨,这是我的(玩具)数据:
data = {'p1': [100., 101, 102, 100, 100],
'p2': [100., 99., 98., 100., 100],
'p3': [1000., 1000., 100., 1000., 1000]
}
df = (pd.DataFrame(data, index=pd.bdate_range(start='20100101', periods=5))
.stack()
.reset_index()
.rename(columns={'level_0': 'date', 'level_1': 'type', 0: 'price'})
.sort_values('date')
)
df['perf'] = df.groupby('type')['price'].apply(lambda x: x.pct_change(1))
df.sort_values('type')
外观如下:
0 2010-01-01 p1 100.0 NaN
3 2010-01-04 p1 101.0 0.010000
6 2010-01-05 p1 102.0 0.009901
9 2010-01-06 p1 100.0 -0.019608
12 2010-01-07 p1 100.0 0.000000
1 2010-01-01 p2 100.0 NaN
4 2010-01-04 p2 99.0 -0.010000
7 2010-01-05 p2 98.0 -0.010101
10 2010-01-06 p2 100.0 0.020408
13 2010-01-07 p2 100.0 0.000000
2 2010-01-01 p3 1000.0 NaN
5 2010-01-04 p3 1000.0 0.000000
8 2010-01-05 p3 100.0 -0.900000 -> outlier
11 2010-01-06 p3 1000.0 9.000000. -> outlier
14 2010-01-07 p3 1000.0 0.000000
我想用没有这些数据的perf列的平均值或中位数替换这些(2)值。我的意思是,我在先前的帮助下进行了计算:
# perf for each type
df['perf'] = df.groupby('type')['price'].apply(lambda x: x.pct_change(1))
# Outliers & replace value with median by date
outliers = df.groupby('type')['price'].apply(lambda x: (x.pct_change(1).abs() >= 0.5))
df.loc[outliers, "perf"] = (df[~outliers]
.groupby('date')
.median()
.loc[df.loc[outliers, "date"], "perf"]
.values
)
df['price2'] = (df.groupby('type')['price'].transform(lambda x: x.iloc[0])).mul(df.groupby('type')['perf'].apply(lambda x: (1+x).cumprod()), fill_value=1)
# New price with the same initial value of the prices but with perf corrected
df.sort_values('type')
,但最后不是“ nice”。有没有办法通过例如函数来改善我的代码?
答案 0 :(得分:0)
这应该有效。
# Filter for outliers
outliers = df['perf'].abs() >= 0.5
# Create DataFrame for the mean of each date
dt_mean = df.groupby('date')['perf'].mean().to_frame().copy()
# Reset index
dt_mean.reset_index(inplace=True)
# Set outliers equal to merger of outliers and mean DataFrame
df.loc[outliers,'perf'] = list(pd.merge(df.loc[outliers, ['date', 'type', 'price']],dt_mean, on='date')['perf'])
date type price perf
0 2010-01-01 p1 100.0 NaN
1 2010-01-01 p2 100.0 NaN
2 2010-01-01 p3 1000.0 NaN
3 2010-01-04 p1 101.0 0.010000
4 2010-01-04 p2 99.0 -0.010000
5 2010-01-04 p3 1000.0 0.000000
6 2010-01-05 p1 102.0 0.009901
7 2010-01-05 p2 98.0 -0.010101
8 2010-01-05 p3 100.0 -0.300067
9 2010-01-06 p1 100.0 -0.019608
10 2010-01-06 p2 100.0 0.020408
11 2010-01-06 p3 1000.0 3.000267
12 2010-01-07 p1 100.0 0.000000
13 2010-01-07 p2 100.0 0.000000
14 2010-01-07 p3 1000.0 0.000000
答案 1 :(得分:0)
如何对平均数据帧执行直接/*
* Make sure the module is in a Tx configuration before trying to use the Tx parameters.
*/
if (uart4.tx_sending)
{
/*
* If the TX_LIST node has more data to transmit, write the next byte to the UART.
*/
if (uart4.tx_cnt < uart4.tx_list->len)
{
// We use mimicced hardware flow control. Don't send until we are clear to
if (pinGet(CELL_CTS) == FALSE) USART_SendData(UART4, uart4.tx_list->ptr[uart4.tx_cnt++]);
else pinSet(CELL_RTS);
}
else
{
/*
* The last byte has been sent. Disable subsequent Tx interrupts, but enable interrupt to
* indicate last byte sent and register now empty
*/
USART_ITConfig(UART4, USART_IT_TXE, DISABLE);
USART_ITConfig(UART4, USART_IT_TC, ENABLE);
pinClear(CELL_RTS);
}
}
查询?
.loc[]
请注意,您的日期平均值(outliers = df.groupby('type')['price'].apply(lambda x: (x.pct_change(1).abs() >= 0.5))
df_mean = df[~outliers].groupby('date').mean()
fill_values = df_mean.loc[df.loc[outliers, "date"], "perf"].values
df.loc[outliers, "perf"] = fill_values # broadcast
df.sort_values('type')
Out[114]:
date type price perf
0 2010-01-01 p1 100.0 NaN
3 2010-01-04 p1 101.0 0.010000
6 2010-01-05 p1 102.0 0.009901
9 2010-01-06 p1 100.0 -0.019608
12 2010-01-07 p1 100.0 0.000000
1 2010-01-01 p2 100.0 NaN
4 2010-01-04 p2 99.0 -0.010000
7 2010-01-05 p2 98.0 -0.010101
10 2010-01-06 p2 100.0 0.020408
13 2010-01-07 p2 100.0 0.000000
2 2010-01-01 p3 1000.0 NaN
5 2010-01-04 p3 1000.0 0.000000
8 2010-01-05 p3 100.0 -0.000100 <- replaced by mean
11 2010-01-06 p3 1000.0 0.000400 <- replaced by mean
14 2010-01-07 p3 1000.0 0.000000
)已被df_mean
索引,似乎无法避免创建它。因此,直接使用其日期索引即可。
date