如何使用Pandas groupby在组上添加顺序计数器列

时间:2014-05-02 19:07:57

标签: python pandas

我觉得有一种比这更好的方法:

import pandas as pd
df = pd.DataFrame(
    [['A', 'X', 3], ['A', 'X', 5], ['A', 'Y', 7], ['A', 'Y', 1],
     ['B', 'X', 3], ['B', 'X', 1], ['B', 'X', 3], ['B', 'Y', 1],
     ['C', 'X', 7], ['C', 'Y', 4], ['C', 'Y', 1], ['C', 'Y', 6]],
    columns=['c1', 'c2', 'v1'])
def callback(x):
    x['seq'] = range(1, x.shape[0] + 1)
    return x
df = df.groupby(['c1', 'c2']).apply(callback)
print df

实现这一目标:

   c1 c2  v1  seq
0   A  X   3    1
1   A  X   5    2
2   A  Y   7    1
3   A  Y   1    2
4   B  X   3    1
5   B  X   1    2
6   B  X   3    3
7   B  Y   1    1
8   C  X   7    1
9   C  Y   4    1
10  C  Y   1    2
11  C  Y   6    3

有没有办法避免回调?

3 个答案:

答案 0 :(得分:41)

使用cumcount(),请参阅文档here

In [4]: df.groupby(['c1', 'c2']).cumcount()
Out[4]: 
0     0
1     1
2     0
3     1
4     0
5     1
6     2
7     0
8     0
9     0
10    1
11    2
dtype: int64

如果您想要从1开始的订单

In [5]: df.groupby(['c1', 'c2']).cumcount()+1
Out[5]: 
0     1
1     2
2     1
3     2
4     1
5     2
6     3
7     1
8     1
9     1
10    2
11    3
dtype: int64

答案 1 :(得分:1)

完整的工作代码

import pandas as pd
df = pd.DataFrame(
    [['A', 'X', 3], ['A', 'X', 5], ['A', 'Y', 7], ['A', 'Y', 1],
     ['B', 'X', 3], ['B', 'X', 1], ['B', 'X', 3], ['B', 'Y', 1],
     ['C', 'X', 7], ['C', 'Y', 4], ['C', 'Y', 1], ['C', 'Y', 6]],
    columns=['c1', 'c2', 'v1'])

df['seq'] = df.groupby(['c1', 'c2']).cumcount() + 1
print(df)

<强>输出

   c1 c2  v1  seq
0   A  X   3    1
1   A  X   5    2
2   A  Y   7    1
3   A  Y   1    2
4   B  X   3    1
5   B  X   1    2
6   B  X   3    3
7   B  Y   1    1
8   C  X   7    1
9   C  Y   4    1
10  C  Y   1    2
11  C  Y   6    3

答案 2 :(得分:0)

这可能有用

@Configuration
@EnableAsync
@EnableScheduling
public class Scheduler implements SchedulingConfigurer, ApplicationListener<EnvironmentChangeEvent> {

    private final ConfigParams configParams;

    private final MyService myService;

    private ScheduledTask scheduledTask;

    @Autowired
    public Scheduler(ConfigParams schedulerConfigParameters, MyService myService) {
        this.configParams = schedulerConfigParameters;
        this.myService = myService;
    }

    @Override
    public void configureTasks(ScheduledTaskRegistrar taskRegistrar) {
        log.info("configureTasks :: ");
        this.scheduledTask = taskRegistrar.scheduleFixedRateTask(newFixedRateTask());
    }

    public FixedRateTask newFixedRateTask() {
        log.info("newFixedRateTask :: fixedRate {}", configParams.getFixedratedelay());
        return new FixedRateTask(myService::processData,
                configParams.getFixedratedelay(),
                0L);
    }

    @Override
    public void onApplicationEvent(EnvironmentChangeEvent environmentChangeEvent) {
        log.info("onApplicationEvent :: {} ", environmentChangeEvent.getKeys());
        if (environmentChangeEvent.getKeys().contains("scheduled.fixedratedelay")) {
            log.info("config parameter::fixedratedelay::changed");
            this.scheduledTask.cancel(); //Problem. It can interrupt the running task
            new CustomScheduledTaskRegistrar(newFixedRateTask()).afterPropertiesSet();
        }
    }
}

public class CustomScheduledTaskRegistrar extends ContextLifecycleScheduledTaskRegistrar {

    private final FixedRateTask fixedRateTask;

    CustomScheduledTaskRegistrar(FixedRateTask fixedRateTask) {
        this.fixedRateTask = fixedRateTask;
    }

    @Override
    public void afterPropertiesSet() {
        log.info("afterPropertiesSet");
        super.addFixedRateTask(fixedRateTask);
        super.afterSingletonsInstantiated();
    }
}

@Service
public class MyService {

    @Async
    public void processData() {
        log.info("inside processData {} {}", Thread.currentThread().getName(), LocalDateTime.now());
        try {
            Thread.sleep(20000L);
        } catch (InterruptedException e) {
            log.error("InterruptedException ");
        }
    }
}

它将创建这样的序列 enter image description here