比较一个熊猫行中的值与前一行中另一个行的值的快速方法?

时间:2016-07-12 20:15:24

标签: python pandas group-by

我有一个DataFrame,df,看起来像:

ID    |          TERM       |   DISC_1
1     |         2003-10     |   ECON
1     |         2002-01     |   ECON
1     |         2002-10     |   ECON
2     |         2003-10     |   CHEM
2     |         2004-01     |   CHEM 
2     |         2004-10     |   ENGN
2     |         2005-01     |   ENGN
3     |         2001-01     |   HISTR
3     |         2002-10     |   HISTR 
3     |         2002-10     |   HISTR

ID是学生ID,TERM是学术术语,DISC_1是他们专业的学科。对于每个学生,我想在(如果)更改DISC_1时识别TERM,然后创建一个报告何时的新DataFrame。零表示他们没有改变。输出如下:

ID    |     Change
1     |         0     
2     |         2004-01    
3     |         0    

我的代码可以使用,但速度非常慢。我尝试使用Groupby执行此操作,但无法执行此操作。有人可以解释我如何更有效地完成这项任务吗?

df = df.sort_values(by = ['PIDM', 'TERM'])
c = 0
last_PIDM = 0
last_DISC_1 = 0
change = [ ]
for index, row in df.iterrows():
    c = c + 1
    if c > 1:
        row['change'] = np.where((row['PIDM'] == last_PIDM) & (row['DISC_1'] != last_DISC_1),     row['TERM'], 0)
        last_PIDM = row['PIDM']
        last_DISC_1 = row['DISC_1']

    else:
        row['change'] = 0
    change.append(row['change'])  

df['change'] = change        
change_terms = df.groupby('PIDM')['change'].max()

2 个答案:

答案 0 :(得分:4)

这是一个开始:

df = df.sort_values(['ID', 'TERM'])
gb = df.groupby('ID').DISC_1
df['Change'] = df.TERM[gb.apply(lambda x: x != x.shift().bfill())]
df.Change = df.Change.fillna(0)

答案 1 :(得分:2)

我从来不是一个大熊猫用户,所以我的解决方案将涉及将df作为csv吐出,并迭代每一行,同时保留前一行。如果它被正确排序(首先按ID,然后按期限日期)我可能会写这样的东西......

import csv

with open('inputDF.csv', 'rb') as infile:
    with open('outputDF.csv', 'wb') as outfile:
        reader = csv.reader(infile)
        writer = csv.writer(outfile)

        previousline = reader.next()  #grab the first row to compare to the second
        termChange = 0
        for line in reader:
            if line[0] != previousline[0]:  #new ID means print and move on to next person
                writer.writerow([previousline[0], termChange])  #print to file ID, termChange date
                termChange = 0
            elif line[2] != previousline[2]:  #new discipline
                termChange = line[1]  #set term changed date
                #termChange = previousline[1]  #in case you want to rather retain the last date they were in the old dicipline

            previousline = line  #store current line as previous and continue loop