通过查找其他行中的值来创建新的pandas数据框列

时间:2020-01-03 04:48:14

标签: python pandas

我想找到一种更快的方法来计算下面的每个产品的52周前销售额而不使用迭代或迭代。有什么建议?输入将是不含“ 52周前的销售额”列的表,输出将是下面的整个表。

         date  sales city product  sales 52 weeks ago
0  2020-01-01    1.5   c1      p1       0.6
1  2020-01-01    1.2   c1      p2       0.3
2  2019-05-02    0.5   c1      p1       nan
3  2019-01-02    0.3   c1      p2       nan
4  2019-01-02    0.6   c1      p1       nan
5  2019-01-01    1.2   c1      p2       nan

示例itertuples代码,但速度很慢:

for row in df.itertuples(index=True, name='Pandas'):
    try:
        df.at[row.Index, 'sales 52 weeks ago']=df[(df['date']==row.date-timedelta(weeks=52))&(df['product']==row.product),'sales']
    except:
        continue

1 个答案:

答案 0 :(得分:0)

您需要用Timedelta减去日期后进行合并:

m=df['date'].sub(pd.Timedelta('52W')).to_frame().assign(product=df['product'])
final = df.assign(sales_52_W_ago=m.merge(df,
         on=['date','product'],how='left').loc[:,'sales'])

        date  sales city product  sales_52_W_ago
0 2020-01-01    1.5   c1      p1             0.6
1 2020-01-01    1.2   c1      p2             0.3
2 2019-05-02    0.5   c1      p1             NaN
3 2019-01-02    0.3   c1      p2             NaN
4 2019-01-02    0.6   c1      p1             NaN
5 2019-01-01    1.2   c1      p2             NaN