与Python / Pandas列中的先前值进行成对比较的循环

时间:2019-06-19 08:53:48

标签: python pandas loops

我项目中的关键一步是跟踪熊猫数据框一列中子样本值的绝对差。

我设法编写了一个for循环来创建子样本。我选择每个人,并每年都要对其进行观察。我进一步访问了每个组的第一个元素的索引,甚至比较了每个第二个元素的索引。

这是我的MWE数据:

df = pd.DataFrame({'year': ['2001', '2004', '2005', '2006', '2007', '2008', '2009',
                             '2003', '2004', '2005', '2006', '2007', '2008', '2009',
                            '2003', '2004', '2005', '2006', '2007', '2008', '2009'],
                   'id': ['1', '1', '1', '1', '1', '1', '1', 
                          '2', '2', '2', '2', '2', '2', '2',
                         '5', '5', '5','5', '5', '5', '5'],
                   'money': ['15', '15', '15', '21', '21', '21', '21', 
                             '17', '17', '17', '20', '17', '17', '17',
                            '25', '30', '22', '25', '8', '7', '12']}).astype(int)

这是我的代码:

# do it for all IDs in my dataframe
for i in df.id.unique():
# now check every given year for that particular ID
    for j in df[df['id']==i].year: 
# access the index of the first element of that ID, as integer
        index = df[df['id']==i].index.values.astype(int)[0]
# use that index to calculate absolute difference of the first and second element 
        abs_diff = abs( df['money'].iloc[index] - df['money'].iloc[index+1] )
# print all the changes, before further calculations
        index =+1
        print(abs_diff)

我的索引没有更新。它产生0000000 0000000 5555555(3 x 7更改),但它应该显示0,0,0,6,0,0,0 0,0,0,3,-3,0,0 0,5,-8,3 ,-17,-1,5(3 x 7更改)。由于第一个或最后一个元素都没有变化,因此我在每个组前面添加了0。

解决方案,我将第二个循环从for更改为while:

for i in df.id.unique():
first = df[df['id']==i].index.values.astype(int)[0] # ID1 = 0 
last = df[df['id']==i].index.values.astype(int)[-1] # ID1 = 6

while first < last:

    abs_diff = abs( df['money'][first] - df['money'][first+1] ) 
    print(abs_diff)
    first +=1

2 个答案:

答案 0 :(得分:2)

对于df.id.unique()中的

    对于df [df ['id'] == i] .year中的j:

    index = df[(df['id']==i)&(df['year']==j)].index.values[0].astype(int)
    try:
        abs_diff = abs(df['money'].iloc[index] - df['money'].iloc[index+1] )
    except:
        pass
    print(abs_diff)`

输出: 0 0 6 0 0 0 4 0 0 3 3 0 0 8 5 8 3 17 1个 5

答案 1 :(得分:1)

您当前始终在检查每个批次的第一个值,因此您需要执行以下操作:

# do it for all IDs in my dataframe
for i in df.id.unique():
# now check every given year for that particular ID
    for idx,j in enumerate(df[df['id']==i].year): 
# access the index of the first element of that ID, as integer
        index = df[df['id']==i].index.values.astype(int)[idx]
# use that index to calculate absolute difference of the first and second element
        try:
            abs_diff = abs( df['money'][index] - df['money'][index+1] )
        except:
            continue
# print all the changes, before further calculations
        index =+1
        print(abs_diff)

哪个输出:

0
0
6
0
0
0
4
0
0
3
3
0
0
8
5
8
3
17
1
5