我在Pandas数据帧上进行迭代时无法执行和设置行操作

时间:2019-10-17 16:44:25

标签: python pandas dataframe

所以,我想按周进行更改/比较(我可以使用班次进行更改),但要注意的是,我只想比较每个类别的星期(所以当下一个类别从第1周开始时,我不会不想将其与上一个类别第51周进行比较。

|  Category  |   Weeknumber   |  BruttoTonnes  |
     Apple           1                15        
     ...            ...              ...
     Apple           51                8
     Pear            1                 5
     ...            ...              ...
     Pear            51               12

这是我的解决方案,可悲的是,由于未知原因,我对数据框不执行任何操作:

for element in df.Category.unique():
    print(df[df.Category == str(element)]['Category']) # This one works, so that is all good  
    df[df.Category == str(element)]['WeekOverWeek%'] = ((df[df.Category == str(element)]['BruttoTonnes'].shift(1)/df[df.Category == str(element)]['BruttoTonnes'])-1)*100

没有结果。没有错误,但也没有结果。

1 个答案:

答案 0 :(得分:1)

通过对自身进行合并,我避免进行任何循环,因此整个过程都是矢量化的,因此应该快

import pandas as pd

# set some sample dummy data
df = pd.DataFrame([['Apple',51,20],['Apple',52,19],['Apple',1,14],['Apple',2,15.2],
    ['Apple',3,17],['Apple',4,17],['Apple',5,18],
    ['Orange',51,10.5],['Orange',52,9],['Orange',1,4],['Orange',2,7],
    ['Orange',3,8]],
    columns=['Category','WeekNum','Tonnes'])

# Set previous week's week number
df['PrevWeekNum']= df['WeekNum']-1
# roll back to week 52 if 0
df.loc[df['PrevWeekNum']==0,['PrevWeekNum']]=52

# Get the previous week's tonnage by doing a left outer merge to itself
df['PrevTonnes']=df.merge( df, left_on=['Category','PrevWeekNum'], right_on=['Category','WeekNum'], how='left' )['Tonnes_y']

# Calculate the difference
df['WeekDelta']= df['Tonnes']-df['PrevTonnes']

结果

   Category  WeekNum  Tonnes  PrevWeekNum  PrevTonnes  WeekDelta
0     Apple       51    20.0           50         NaN        NaN
1     Apple       52    19.0           51        20.0       -1.0
2     Apple        1    14.0           52        19.0       -5.0
3     Apple        2    15.2            1        14.0        1.2
4     Apple        3    17.0            2        15.2        1.8
5     Apple        4    17.0            3        17.0        0.0
6     Apple        5    18.0            4        17.0        1.0
7    Orange       51    10.5           50         NaN        NaN
8    Orange       52     9.0           51        10.5       -1.5
9    Orange        1     4.0           52         9.0       -5.0
10   Orange        2     7.0            1         4.0        3.0
11   Orange        3     8.0            2         7.0        1.0

使用df.drop()删除不需要的任何列

理想情况下,您还应该在数据中包括日期或年份,以避免从错误的年份中查找每周吨位。