下面的代码提供指定值更改次数的累积import pandas as pd
import numpy as np
d = ({
'Who' : ['Out','Even','Home','Home','Even','Away','Home','Out','Even','Away','Away','Home','Away'],
})
#Specified Values
Teams = ['Home', 'Away']
for who in Teams:
s = df[df.Who==who].index.to_series().diff()!=1
df['Change_'+who] = s[s].cumsum()
。该值必须更改以返回计数。
Who Change_Home Change_Away
0 Out NaN NaN
1 Even NaN NaN
2 Home 1.0 NaN
3 Home NaN NaN
4 Even NaN NaN
5 Away NaN 1.0
6 Home 2.0 NaN
7 Out NaN NaN
8 Even NaN NaN
9 Away NaN 2.0
10 Away NaN NaN
11 Home 3.0 NaN
12 Away NaN 3.0
输出:
Home
我正在尝试根据Away
和Home
之前的值对输出进行进一步排序。就像上面的代码一样,无法区分Away
和Home/Away
的更改内容。它只是计算更改为Home/Away
的次数。
是否可以更改上面的代码以将其分解为 Even_Away Even_Home Swap_Away Swap_Home Who
0 Out
1 Even
2 1 Home
3 Home
4 Even
5 1 Away
6 1 Home
7 Out
8 Even
9 2 Away
10 Away
11 2 Home
12 1 Away
的内容?还是必须重新开始?
我的预期输出是:
Even_
因此Even
表示从Home/Away
到Swap_
的次数,而Home to Away
表示从'/'==>'%2F'
到它的次数,反之亦然。 / p>
答案 0 :(得分:2)
动态解决方案的主要功能是get_dummies
-为Teams
列表中定义的所有先前值创建新列:
#create DataFrame
df = pd.DataFrame(d)
Teams = ['Home', 'Away']
#create boolean mask for check value by list and compare with shifted column
shifted = df['Who'].shift().fillna('')
m1 = df['Who'].isin(Teams)
#mask for exclude same previous values Home_Home, Away_Away
m2 = df['Who'] == shifted
#chain together, ~ invert mask
m = m1 & ~m2
#join column by mask and create indicator df
df1 = pd.get_dummies(np.where(m, shifted + '_' + df['Who'], np.nan))
#rename columns dynamically
c = df1.columns[df1.columns.str.startswith(tuple(Teams))]
c1 = ['Swap_' + x.split('_')[1] for x in c]
df1 = df1.rename(columns = dict(zip(c, c1)))
#count values by cumulative sum, add column Who
df2 = df1.cumsum().mask(df1 == 0, 0).join(df[['Who']])
print (df2)
Swap_Home Even_Away Even_Home Swap_Away Who
0 0 0 0 0 Out
1 0 0 0 0 Even
2 0 0 1 0 Home
3 0 0 0 0 Home
4 0 0 0 0 Even
5 0 1 0 0 Away
6 1 0 0 0 Home
7 0 0 0 0 Out
8 0 0 0 0 Even
9 0 2 0 0 Away
10 0 0 0 0 Away
11 2 0 0 0 Home
12 0 0 0 1 Away