我正在尝试返回count
中值发生变化的次数的累积column
。
因此,对于下面的df
,我想返回'Home'
变为'Away'
的次数的连续计数,反之亦然。我不想返回显示值的次数。
import pandas as pd
d = ({
'Who' : ['Home','Away','','','Home','Away','Home','Home','Home','','Away','Home'],
})
df = pd.DataFrame(data = d)
我尝试过了。
df['Home_count'] = (df['Who'] == 'Home').cumsum()
df['Away_count'] = (df['Who'] == 'Away').cumsum()
哪个返回:
Who Home_count Away_count
0 Home 1 0
1 Away 1 1
2 1 1
3 1 1
4 Home 2 1
5 Away 2 2
6 Home 3 2
7 Home 4 2
8 Home 5 2
9 5 2
10 Away 5 3
11 Home 6 3
但是我试图计算它改变的次数。不是每个值的总计数。因此,如果它显示“家”,“家”,“家”,“离开”,则“离开”旁边应该只有一个计数。不是1,2,3对阵Home
Home 1 #Theres a change so provide a count
Home #No change so no count
Home #No change so no count
Away 1 #Theres a change so provide a count
Home 2 #Theres a change so provide a count
请参考预期的输出:
预期输出:
Count_Away Count_Home Who
0 1 Home
1 1 Away
2
3
4 2 Home
5 2 Away
6 3 Home
7 Home
8 Home
9
10 3 Away
11 4 Home
答案 0 :(得分:1)
pd.get_dummies
获取一次性编码的DataFrame cumsum
v
及其转换版本来查找变更点
v = pd.get_dummies(
df.where(df.Who.ne(df.Who.shift()) & df.Who.str.len().astype(bool)
), prefix='Count'
).cumsum()
df = pd.concat([
v.where(v.ne(v.shift())).fillna('', downcast='infer'), df
], axis=1
)
print(df)
Count_Away Count_Home Who
0 0 1 Home
1 1 Away
2
3
4 2 Home
5 2 Away
6 3 Home
7 Home
8 Home
9
10 3 Away
11 4 Home
答案 1 :(得分:0)
在每次更改列中的值时,都会显示每个单词Home&Away的计数。
import pandas as pd
d = ({
'Who' : ['Home','Away','','','Home','Away','Home','Home','Home','','Away','Home'],
})
df = pd.DataFrame(data = d)
countaway=0
counthome=0
df['Count_Away']=0
df['Count_Home']=0
for index,rows in df.iterrows():
if(rows['Who']=='Home'):
df['Count_Home'].values[index]=counthome+1
counthome+=1
else:
df['Count_Home'].values[index]=0
if(rows['Who']=='Away'):
df['Count_Away'].values[index]=countaway+1
countaway+=1
else:
df['Count_Away'].values[index]=0
输出:
Who Count_Away Count_Home
0 Home 0 1
1 Away 1 0
2 0 0
3 0 0
4 Home 0 2
5 Away 2 0
6 Home 0 3
7 Home 0 4
8 Home 0 5
9 0 0
10 Away 3 0
11 Home 0 6
答案 2 :(得分:0)
这是一种仅在从“家”变为“客”时(反之亦然)进行计数的方法。如果两个相同类型之间没有空格sudo bundle install
,则不会递增。
Who
输出:
import pandas as pd
import numpy as np
whos = ['Home', 'Away']
for who in whos:
# Find where `Who` is not consecutive based on index. Don't consider blank gaps
# when determining changes.
s = df[df.replace('', np.NaN).fillna(method='ffill').Who==who].index.to_series().diff()!=1
# Get the counts, align to original df based on index.
df['Count_'+who] = s[s].cumsum()
# Replace NaN with empty string to match your output
df['Count_'+who] = df['Count_'+who].replace(np.NaN, '')