有没有更pythonic的方式来实现条件计数?

时间:2021-06-14 20:04:06

标签: python python-3.x pandas

是否有更 Pythonic(即没有 for 循环)的方法来在下面的数据框中生成 count 列?

import pandas as pd

val = [1, 1, 1, 1, 2, 3, 4, 4, 5, 5, 5, 6, 7, 7, 7, 7]

df = pd.DataFrame(val, index = pd.date_range("2021-05-30", freq="D",
                               periods=len(val)), columns=['vals'])

df['count'] = 0

for n in set(df['vals']):
     end = df['vals'].value_counts()[n] + 1
     df.loc[df['vals'] == n, 'count'] = range(1, end)

df
            vals  count
2021-05-30     1      1
2021-05-31     1      2
2021-06-01     1      3
2021-06-02     1      4
2021-06-03     2      1
2021-06-04     3      1
2021-06-05     4      1
2021-06-06     4      2
2021-06-07     5      1
2021-06-08     5      2
2021-06-09     5      3
2021-06-10     6      1
2021-06-11     7      1
2021-06-12     7      2
2021-06-13     7      3
2021-06-14     7      4

2 个答案:

答案 0 :(得分:3)

一种方式:

df['count'] = df.groupby('vals').cumcount() + 1

输出:

            vals  count
2021-05-30     1      1
2021-05-31     1      2
2021-06-01     1      3
2021-06-02     1      4
2021-06-03     2      1
2021-06-04     3      1
2021-06-05     4      1
2021-06-06     4      2
2021-06-07     5      1
2021-06-08     5      2
2021-06-09     5      3
2021-06-10     6      1
2021-06-11     7      1
2021-06-12     7      2
2021-06-13     7      3
2021-06-14     7      4

答案 1 :(得分:1)

试试:

df['count']=df.groupby((df['vals'] != df['vals'].shift()).cumsum()).cumcount() + 1

输出:

            vals  count
2021-05-30     1      1
2021-05-31     1      2
2021-06-01     1      3
2021-06-02     1      4
2021-06-03     2      1
2021-06-04     3      1
2021-06-05     4      1
2021-06-06     4      2
2021-06-07     5      1
2021-06-08     5      2
2021-06-09     5      3
2021-06-10     6      1
2021-06-11     7      1
2021-06-12     7      2
2021-06-13     7      3
2021-06-14     7      4

此方法将处理不连续的重复 val,为 val 中的每次更改重新开始计数。