给出以下数据框df
:
df = pd.DataFrame({'A':['Tony', 'Mike', 'Jen', 'Anna'], 'B': ['no', 'yes', 'no', 'yes']})
A B
0 Tony no
1 Mike yes
2 Jen no
3 Anna yes
我想添加另一列,该列逐渐计数带有df['B']='yes'
的元素:
A B C
0 Tony no 0
1 Mike yes 1
2 Jen no 0
3 Anna yes 2
我该怎么做?
答案 0 :(得分:3)
您可以将numpy.where
与cumsum
的布尔掩码一起使用:
m = df['B']=='yes'
df['C'] = np.where(m, m.cumsum(), 0)
另一种解决方案是通过过滤创建count
布尔掩码,然后通过reindex
将0
值相加:
m = df['B']=='yes'
df['C'] = m[m].cumsum().reindex(df.index, fill_value=0)
print (df)
A B C
0 Tony no 0
1 Mike yes 1
2 Jen no 0
3 Anna yes 2
性能(实际数据应该有所不同,最好先检查一下):
np.random.seed(123)
N = 10000
L = ['yes','no']
df = pd.DataFrame({'B': np.random.choice(L, N)})
print (df)
In [150]: %%timeit
...: m = df['B']=='yes'
...: df['C'] = np.where(m, m.cumsum(), 0)
...:
1.57 ms ± 34.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [151]: %%timeit
...: m = df['B']=='yes'
...: df['C'] = m[m].cumsum().reindex(df.index, fill_value=0)
...:
2.53 ms ± 54.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [152]: %%timeit
...: df['C'] = df.groupby('B').cumcount() + 1
...: df['C'].where(df['B'] == 'yes', 0, inplace=True)
4.49 ms ± 27.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
答案 1 :(得分:2)
您可以使用GroupBy
+ cumcount
后跟pd.Series.where
:
df['C'] = df.groupby('B').cumcount() + 1
df['C'].where(df['B'] == 'yes', 0, inplace=True)
print(df)
A B C
0 Tony no 0
1 Mike yes 1
2 Jen no 0
3 Anna yes 2