是否有更 Pythonic(即没有 for
循环)的方法来在下面的数据框中生成 count
列?
import pandas as pd
val = [1, 1, 1, 1, 2, 3, 4, 4, 5, 5, 5, 6, 7, 7, 7, 7]
df = pd.DataFrame(val, index = pd.date_range("2021-05-30", freq="D",
periods=len(val)), columns=['vals'])
df['count'] = 0
for n in set(df['vals']):
end = df['vals'].value_counts()[n] + 1
df.loc[df['vals'] == n, 'count'] = range(1, end)
df
vals count
2021-05-30 1 1
2021-05-31 1 2
2021-06-01 1 3
2021-06-02 1 4
2021-06-03 2 1
2021-06-04 3 1
2021-06-05 4 1
2021-06-06 4 2
2021-06-07 5 1
2021-06-08 5 2
2021-06-09 5 3
2021-06-10 6 1
2021-06-11 7 1
2021-06-12 7 2
2021-06-13 7 3
2021-06-14 7 4
答案 0 :(得分:3)
一种方式:
df['count'] = df.groupby('vals').cumcount() + 1
vals count
2021-05-30 1 1
2021-05-31 1 2
2021-06-01 1 3
2021-06-02 1 4
2021-06-03 2 1
2021-06-04 3 1
2021-06-05 4 1
2021-06-06 4 2
2021-06-07 5 1
2021-06-08 5 2
2021-06-09 5 3
2021-06-10 6 1
2021-06-11 7 1
2021-06-12 7 2
2021-06-13 7 3
2021-06-14 7 4
答案 1 :(得分:1)
试试:
df['count']=df.groupby((df['vals'] != df['vals'].shift()).cumsum()).cumcount() + 1
输出:
vals count
2021-05-30 1 1
2021-05-31 1 2
2021-06-01 1 3
2021-06-02 1 4
2021-06-03 2 1
2021-06-04 3 1
2021-06-05 4 1
2021-06-06 4 2
2021-06-07 5 1
2021-06-08 5 2
2021-06-09 5 3
2021-06-10 6 1
2021-06-11 7 1
2021-06-12 7 2
2021-06-13 7 3
2021-06-14 7 4
此方法将处理不连续的重复 val,为 val 中的每次更改重新开始计数。