我按ID将以下数据分组:
import pandas as pd
df_data = pd.DataFrame(data={'id': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
'period': [1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
'feature': [1, 5, 3, 4, 8, 10, 13, 12, 15, 19]})
df_weights = pd.DataFrame(data={'id': [1, 2],
'w1': [0.3, 0.25],
'w2': [0.15, 0.20]})
lags = [1, 2]
我需要为df_data中的每个ID添加一个新功能:
def transform_feature(df, lags, feature, feature_new, weights):
df.loc[:, feature_new] = df[feature]
for i, lag in enumerate(lags):
df.loc[:, feature_new] = df.loc[:, feature_new] - df[feature].shift(lag) * weights[i]
return df
我可以为单个ID进行以下操作:
id_tmp = 1
df_data_tmp = df_data[df_data['id'] == id_tmp]
weights = df_weights[['w1', 'w2']][df_weights['id'] == id_tmp].values.tolist()[0]
df_data_subset = transform_feature(df_data_tmp, lags, 'feature', 'feature_new', weights)
如何对所有ID(对整个df_data)执行此操作?
编辑-预期输出:
import numpy as np
df_data = pd.DataFrame(data={'id': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
'period': [1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
'feature': [1, 5, 3, 4, 8, 10, 13, 12, 15, 19],
'feature_new': [np.nan, np.nan, 1.35, 2.35, 6.35, np.nan, np.nan, 6.75, 9.40, 12.85]})
答案 0 :(得分:1)
IIUC,您可以巧妙地使用lambda。
def transform_feature(df, lags, feature, feature_new, df_weight):
weights = df_weights[['w1', 'w2']][df_weights['id'] == df.id.unique()[0]].values.tolist()[0]
df[feature_new] = df[feature]
for i, lag in enumerate(lags):
df[feature_new] = df[feature_new] - df[feature].shift(lag) * weights[i]
return df
df_data.groupby("id").apply(lambda x: transform_feature(x,lags,'feature','features_new',df_weights))
# Output
feature id period features_new
0 1 1 1 NaN
1 5 1 2 NaN
2 3 1 3 1.35
3 4 1 4 2.35
4 8 1 5 6.35
5 10 2 1 NaN
6 13 2 2 NaN
7 12 2 3 6.75
8 15 2 4 9.40
9 19 2 5 12.85
这是因为Groupby.apply没有参数args,因此当您要将参数添加到apply函数时,可以使用lambda。但是如果您使用df.apply,则只需使用
df.apply(your_func, args=(,))