Python:基于整数值的值在Pandas数据帧中创建组列

时间:2016-09-23 13:28:22

标签: python pandas

对于[0, 150]列中的每个范围diff,我想创建一个每次重置范围时增加1的组列。如果diff为负数,则范围会重置。

import pandas as pd
df = pd.DataFrame({'year': [2016, 2016, 2016, 2016, 2016, 2016, 2016],
                   'month' : [1, 1, 2, 3, 3, 3, 3],
                   'day': [23, 25, 1, 1, 7, 20, 30]})
df = pd.to_datetime(df)
df = pd.concat([df, pd.Series(data=[15, 35, 80, 5, 20, 45, 90])], axis=1)
df.columns = ['date', 'percentworn']
col_shift = ['percentworn']
df_shift = df.shift(1).loc[:, col_shift]
df_combined = df.join(df_shift, how='inner', rsuffix='_2')
df_combined.fillna(value=0,inplace=True)
df_combined['diff'] = df_combined['percentworn'] - df_combined['percentworn_2']

enter image description here

grp列应为0, 0, 0, 1, 1, 1, 1。我试过的代码是

def grping(df):
    df_ = df.copy(deep=True)
    i = 0
    if df_['diff'] >= 0:
        df_['grp'] = i
    else:
        i += 1
        df_['grp'] = i
    return df_
df_combined.apply(grping,axis=1) 

增量后我需要i += 1持续存在。我怎样才能做到这一点?或者有更好的方法来获得理想的结果吗?

enter image description here

1 个答案:

答案 0 :(得分:2)

IIUC你可以测试#post-34917 #attachment_34937 { display: inline-block; max-width: calc(100% - 220px); position: relative; margin-left: -20px; } #content .aligncenter>img { width: 100%; height: auto; } @media (max-width: 1023px) { #post-34917 #attachment_34937 { max-width: calc(100% - 170px); } } 列是否为负数,产生一个布尔数组,然后将其转换为'diff'和调用int

cumsum

打破上述情况:

In [313]:
df_combined['group'] = (df_combined['diff'] < 0).astype(int).cumsum()
df_combined

Out[313]:
        date  percentworn  percentworn_2  diff  group
0 2016-01-23           15            0.0  15.0      0
1 2016-01-25           35           15.0  20.0      0
2 2016-02-01           80           35.0  45.0      0
3 2016-03-01            5           80.0 -75.0      1
4 2016-03-07           20            5.0  15.0      1
5 2016-03-20           45           20.0  25.0      1
6 2016-03-30           90           45.0  45.0      1