熊猫df中两列之间的累计计数

时间:2018-06-26 02:22:15

标签: python pandas count

我正在尝试返回count中值发生变化的次数的累积column

因此,对于下面的df,我想返回'Home'变为'Away'的次数的连续计数,反之亦然。我不想返回显示值的次数。

import pandas as pd

d = ({
    'Who' : ['Home','Away','','','Home','Away','Home','Home','Home','','Away','Home'],
    })

df = pd.DataFrame(data = d)

我尝试过了。

df['Home_count'] = (df['Who'] == 'Home').cumsum()
df['Away_count'] = (df['Who'] == 'Away').cumsum()

哪个返回:

     Who  Home_count  Away_count
0   Home           1           0
1   Away           1           1
2                  1           1
3                  1           1
4   Home           2           1
5   Away           2           2
6   Home           3           2
7   Home           4           2
8   Home           5           2
9                  5           2
10  Away           5           3
11  Home           6           3

但是我试图计算它改变的次数。不是每个值的总计数。因此,如果它显示“家”,“家”,“家”,“离开”,则“离开”旁边应该只有一个计数。不是1,2,3对阵Home

Home 1 #Theres a change so provide a count
Home   #No change so no count
Home   #No change so no count
Away 1 #Theres a change so provide a count
Home 2 #Theres a change so provide a count

请参考预期的输出:

预期输出:

   Count_Away Count_Home   Who
0                      1  Home
1           1             Away
2                             
3                             
4                      2  Home
5           2             Away
6                      3  Home
7                         Home
8                         Home
9                             
10          3             Away
11                     4  Home

3 个答案:

答案 0 :(得分:1)

  1. 使用pd.get_dummies获取一次性编码的DataFrame
  2. 使用cumsum
  3. 计算累计和
  4. 通过比较v及其转换版本来查找变更点
  5. 用空字符串填充NaNs
  6. 将结果与原始结果连接

v = pd.get_dummies(
      df.where(df.Who.ne(df.Who.shift()) & df.Who.str.len().astype(bool)
   ), prefix='Count'
).cumsum()

df = pd.concat([
     v.where(v.ne(v.shift())).fillna('', downcast='infer'), df
  ], axis=1
)

print(df)
   Count_Away Count_Home   Who
0           0          1  Home
1           1             Away
2                             
3                             
4                      2  Home
5           2             Away
6                      3  Home
7                         Home
8                         Home
9                             
10          3             Away
11                     4  Home

答案 1 :(得分:0)

在每次更改列中的值时,都会显示每个单词Home&Away的计数。

import pandas as pd

d = ({
    'Who' : ['Home','Away','','','Home','Away','Home','Home','Home','','Away','Home'],
    })
df = pd.DataFrame(data = d)



countaway=0
counthome=0
df['Count_Away']=0
df['Count_Home']=0

for index,rows in df.iterrows():    
    if(rows['Who']=='Home'):
        df['Count_Home'].values[index]=counthome+1
        counthome+=1
    else:
        df['Count_Home'].values[index]=0 
    if(rows['Who']=='Away'):
        df['Count_Away'].values[index]=countaway+1
        countaway+=1
    else:
        df['Count_Away'].values[index]=0

输出:

   Who  Count_Away  Count_Home
0   Home    0         1
1   Away    1         0
2           0         0
3           0         0
4   Home    0         2
5   Away    2         0
6   Home    0         3
7   Home    0         4
8   Home    0         5
9           0         0 
10  Away    3         0
11  Home    0         6

答案 2 :(得分:0)

这是一种仅在从“家”变为“客”时(反之亦然)进行计数的方法。如果两个相同类型之间没有空格sudo bundle install,则不会递增。

Who

输出:

import pandas as pd
import numpy as np

whos = ['Home', 'Away']
for who in whos:
    # Find where `Who` is not consecutive based on index. Don't consider blank gaps
    # when determining changes. 
    s = df[df.replace('', np.NaN).fillna(method='ffill').Who==who].index.to_series().diff()!=1

    # Get the counts, align to original df based on index.
    df['Count_'+who] = s[s].cumsum()

    # Replace NaN with empty string to match your output
    df['Count_'+who] = df['Count_'+who].replace(np.NaN, '')