熊猫df中当前值的运行计数

时间:2018-08-29 07:25:40

标签: python pandas count

我有counts当前发生的值数量的代码。它通过解析df来查看它们是否再次出现来实现此目的。

因此,对于下面的df,我计算的是Col['Area']中当前出现了多少个值。

import pandas as pd

d = ({
    'Code' : ['A','A','B','A','B','B','A','B','A','A'],            
    'Area' : ['Home','Home','Shops','Park','Cafe','Shops','Home','Cafe','Work','Park'],  
     })

df = pd.DataFrame(data=d)

df['u'] = df[::-1].groupby('Area').Area.cumcount()

ids = [1]
seen = set([df.iloc[0].Area])
dec = False
for val, u in zip(df.Area[1:], df.u[1:]):
    ids.append(ids[-1] + (val not in seen) - dec)
    seen.add(val)
    dec = u == 0

df['On'] = ids

问题是我只想将此函数应用于'A'中的值Col['Code']

我可以执行以下操作,但这会减少我的df

df = df[df.Code == 'A']

我希望产生以下内容;

  Code   Area  u On
0    A   Home  2  1
1    A   Home  1  1
2    B  Shops      
3    A   Park  1  2
4    B   Cafe      
5    B  Shops      
6    A   Home  0  2
7    B   Cafe      
8    A   Work  0  2
9    A   Park  0  2

我可以更改它以添加['Code']

df['u'] = df[::-1].groupby('Area').Area.cumcount() 

2 个答案:

答案 0 :(得分:0)

您尝试使用此代码来获取所需的输出

File: '/storage/emulated/0/Android/data/com.dotsquares.ecomhybrid/files/Pictures/c5df03f7-097d-47ca-a3c5-f896b2a38c086982492957343724084.jpg'

答案 1 :(得分:-2)

我认为需要首先通过A值进行过滤,应用解决方案,最后通过reindex为不匹配的值添加NaN:

df1 = df[df.Code == 'A'].copy()

df1['u'] = df1[::-1].groupby('Area').Area.cumcount()

ids = [1]
seen = set([df1.iloc[0].Area])
dec = False
for val, u in zip(df1.Area[1:], df1.u[1:]):
    ids.append(ids[-1] + (val not in seen) - dec)
    seen.add(val)
    dec = u == 0

df1['On'] = ids

df1 = df1.reindex(df.index).fillna(df)
print (df1)
  Code   Area    u   On
0    A   Home  2.0  1.0
1    A   Home  1.0  1.0
2    B  Shops  NaN  NaN
3    A   Park  1.0  2.0
4    B   Cafe  NaN  NaN
5    B  Shops  NaN  NaN
6    A   Home  0.0  2.0
7    B   Cafe  NaN  NaN
8    A   Work  0.0  2.0
9    A   Park  0.0  1.0

最后可能添加fillna(''),但不建议添加,因为获取混合值-带有字符串的数字,然后某些函数应该失败。