按ID和条件分组

时间:2015-11-17 23:51:49

标签: python pandas dataframe

我有一个数据帧df

df=DataFrame({'id':    ['a','a','a','a','a','a','a','b','b','b','b','b','b','b','b','b','b'],
'min':[10,17,21,22,22,7,58,15,17,19,19,19,19,19,25,26,26],
'day':[15,15,15,15,15,17,17,41,41,41,41,41,41,41,57,57,57]})

看起来像

   id  min  day
0   a   10   15
1   a   17   15
2   a   21   15
3   a   30   15
4   a   50   15
5   a   57   17
6   a   58   17
7   b   15   41
8   b   17   41
9   b   19   41
10  b   19   41
11  b   19   41
12  b   19   41
13  b   19   41
14  b   25   57
15  b   26   57
16  b   26   57

我想要一个新列,根据id和行之间的关系按特定格式对数据进行分类,如下所示,如果连续行的最小值差异小于8且日期值与我想要的相同将它们分配给同一个组,所以我的输出看起来像。

   id  min  day  category
0   a   10   15     1
1   a   17   15     1
2   a   21   15     1
3   a   30   15     2
4   a   50   15     3
5   a   57   17     4
6   a   58   17     4
7   b   15   41     5
8   b   17   41     5
9   b   19   41     5
10  b   19   41     5
11  b   19   41     5
12  b   19   41     5
13  b   19   41     5
14  b   25   57     6
15  b   26   57     6
16  b   26   57     6

1 个答案:

答案 0 :(得分:0)

希望这会有所帮助。让我知道你的看法。

一切顺利。

import pandas as pd

df=pd.DataFrame({'id':    ['a','a','a','a','a','a','a','b','b','b','b','b','b','b','b','b','b'],
                'min':[10,17,21,22,22,7,58,15,17,19,19,19,19,19,25,26,26],
                'day':[15,15,15,15,15,17,17,41,41,41,41,41,41,41,57,57,57]})

# initialize the catagory to 1 for counter increament
cat =1

# for the first row the catagory will be 1
new_series = [cat]



# loop will start from 1 and not from 0 because we cannot perform operation on iloc -1
for i in range(1,len(df)):
    if df.iloc[i]['day'] == df.iloc[i-1]['day']:
        if df.iloc[i]['min'] - df.iloc[i-1]['min'] > 8:
            cat+=1
    else:
        cat+=1
    new_series.append(cat)

df['catagory']= new_series
print(df)