我有一个熊猫数据框,可根据日期对产品购买进行建模。我想添加昨天,上周发生了多少次购买的功能等。有一种优雅而有效的方法吗?现在我正在做一个循环,这需要很多时间..
鉴于数据:
one_day = pd.to_timedelta(1, unit='d')
two_days = pd.to_timedelta(2, unit='d')
yesterday_sales, last_two_days_sales = [], []
for _, row in df.iterrows():
yesterday_performance = df.loc[(df["product"] == row["product"]) & (df.dates == (row["dates"]-one_day)) ]
if yesterday_performance.shape[0] == 1:
yesterday_sales.append(yesterday_performance.sales.values[0])
else:
yesterday_sales.append(-1)
two_days_sales = df.loc[(df["product"] == row["product"]) & (df["dates"] >= (row["dates"]-two_days)) & (df["dates"] < (row["dates"]))]
if two_days_sales.shape[0] >= 1:
last_two_days_sales.append(two_days_sales.sales.sum())
else:
last_two_days_sales.append(-1)
df["yesterday_sales"] = yesterday_sales
df["last_two_days_sales"] = last_two_days_sales
获取前几天的销售额和前两天的销售总额我循环:
div#bootmPanel >div {
vertical-align: middle;
}
循环中的所有内容都很耗时,但我想不出更好的方法。
答案 0 :(得分:1)
我简化了你的代码。它仍然没有矢量化,但如果性能不是问题,它应该更容易维护:
print (df)
experiment_a experiment_b
0 EXPT_2011_03 NaN
1 NaN EXPT_2009_08
2 NaN EXPT_2010_06
3 EXPT_2010_07 NaN
4 NaN EXPT_2011_07
#[500000 rows x 2 columns]
df = pd.concat([df] * 100000, ignore_index=True)
In [41]: %timeit (df.iloc[(np.where(df['experiment_a'].isnull(), df['experiment_b'], df['experiment_a'])).argsort()])
1 loop, best of 3: 318 ms per loop
In [42]: %timeit (df.iloc[df['experiment_a'].fillna(df['experiment_b']).argsort()])
1 loop, best of 3: 335 ms per loop
In [43]: %timeit (df.iloc[df['experiment_a'].combine_first(df['experiment_b']).argsort()])
1 loop, best of 3: 333 ms per loop
In [44]: %timeit (df.iloc[df.experiment_a.where(df.experiment_a.notnull(), df.experiment_b).argsort()])
1 loop, best of 3: 342 ms per loop