基于熊猫df中不同值的累积计数

时间:2018-06-26 06:54:39

标签: python pandas loops numpy count

下面的代码提供指定值更改次数的累积import pandas as pd import numpy as np d = ({ 'Who' : ['Out','Even','Home','Home','Even','Away','Home','Out','Even','Away','Away','Home','Away'], }) #Specified Values Teams = ['Home', 'Away'] for who in Teams: s = df[df.Who==who].index.to_series().diff()!=1 df['Change_'+who] = s[s].cumsum() 。该值必须更改以返回计数。

     Who  Change_Home  Change_Away
0    Out          NaN          NaN
1   Even          NaN          NaN
2   Home          1.0          NaN
3   Home          NaN          NaN
4   Even          NaN          NaN
5   Away          NaN          1.0
6   Home          2.0          NaN
7    Out          NaN          NaN
8   Even          NaN          NaN
9   Away          NaN          2.0
10  Away          NaN          NaN
11  Home          3.0          NaN
12  Away          NaN          3.0

输出:

Home

我正在尝试根据AwayHome之前的值对输出进行进一步排序。就像上面的代码一样,无法区分AwayHome/Away的更改内容。它只是计算更改为Home/Away的次数。

是否可以更改上面的代码以将其分解为 Even_Away Even_Home Swap_Away Swap_Home Who 0 Out 1 Even 2 1 Home 3 Home 4 Even 5 1 Away 6 1 Home 7 Out 8 Even 9 2 Away 10 Away 11 2 Home 12 1 Away 的内容?还是必须重新开始?

我的预期输出是:

Even_

因此Even表示从Home/AwaySwap_的次数,而Home to Away表示从'/'==>'%2F'到它的次数,反之亦然。 / p>

1 个答案:

答案 0 :(得分:2)

动态解决方案的主要功能是get_dummies-为Teams列表中定义的所有先前值创建新列:

#create DataFrame
df = pd.DataFrame(d)

Teams = ['Home', 'Away']

#create boolean mask for check value by list and compare with shifted column
shifted = df['Who'].shift().fillna('')
m1 = df['Who'].isin(Teams)
#mask for exclude same previous values Home_Home, Away_Away
m2 = df['Who'] == shifted
#chain together, ~ invert mask
m = m1 & ~m2

#join column by mask and create indicator df
df1 = pd.get_dummies(np.where(m, shifted + '_' + df['Who'], np.nan))

#rename columns dynamically
c = df1.columns[df1.columns.str.startswith(tuple(Teams))]
c1 = ['Swap_' + x.split('_')[1] for x in c]
df1 = df1.rename(columns = dict(zip(c, c1)))

#count values by cumulative sum, add column Who
df2 = df1.cumsum().mask(df1 == 0, 0).join(df[['Who']])

print (df2)
    Swap_Home  Even_Away  Even_Home  Swap_Away   Who
0           0          0          0          0   Out
1           0          0          0          0  Even
2           0          0          1          0  Home
3           0          0          0          0  Home
4           0          0          0          0  Even
5           0          1          0          0  Away
6           1          0          0          0  Home
7           0          0          0          0   Out
8           0          0          0          0  Even
9           0          2          0          0  Away
10          0          0          0          0  Away
11          2          0          0          0  Home
12          0          0          0          1  Away