我有counts
当前发生的值数量的代码。它通过解析df
来查看它们是否再次出现来实现此目的。
因此,对于下面的df
,我计算的是Col['Area']
中当前出现了多少个值。
import pandas as pd
d = ({
'Code' : ['A','A','B','A','B','B','A','B','A','A'],
'Area' : ['Home','Home','Shops','Park','Cafe','Shops','Home','Cafe','Work','Park'],
})
df = pd.DataFrame(data=d)
df['u'] = df[::-1].groupby('Area').Area.cumcount()
ids = [1]
seen = set([df.iloc[0].Area])
dec = False
for val, u in zip(df.Area[1:], df.u[1:]):
ids.append(ids[-1] + (val not in seen) - dec)
seen.add(val)
dec = u == 0
df['On'] = ids
问题是我只想将此函数应用于'A'
中的值Col['Code']
。
我可以执行以下操作,但这会减少我的df
。
df = df[df.Code == 'A']
我希望产生以下内容;
Code Area u On
0 A Home 2 1
1 A Home 1 1
2 B Shops
3 A Park 1 2
4 B Cafe
5 B Shops
6 A Home 0 2
7 B Cafe
8 A Work 0 2
9 A Park 0 2
我可以更改它以添加['Code']
df['u'] = df[::-1].groupby('Area').Area.cumcount()
答案 0 :(得分:0)
您尝试使用此代码来获取所需的输出
File: '/storage/emulated/0/Android/data/com.dotsquares.ecomhybrid/files/Pictures/c5df03f7-097d-47ca-a3c5-f896b2a38c086982492957343724084.jpg'
答案 1 :(得分:-2)
我认为需要首先通过A
值进行过滤,应用解决方案,最后通过reindex
为不匹配的值添加NaN:
df1 = df[df.Code == 'A'].copy()
df1['u'] = df1[::-1].groupby('Area').Area.cumcount()
ids = [1]
seen = set([df1.iloc[0].Area])
dec = False
for val, u in zip(df1.Area[1:], df1.u[1:]):
ids.append(ids[-1] + (val not in seen) - dec)
seen.add(val)
dec = u == 0
df1['On'] = ids
df1 = df1.reindex(df.index).fillna(df)
print (df1)
Code Area u On
0 A Home 2.0 1.0
1 A Home 1.0 1.0
2 B Shops NaN NaN
3 A Park 1.0 2.0
4 B Cafe NaN NaN
5 B Shops NaN NaN
6 A Home 0.0 2.0
7 B Cafe NaN NaN
8 A Work 0.0 2.0
9 A Park 0.0 1.0
最后可能添加fillna('')
,但不建议添加,因为获取混合值-带有字符串的数字,然后某些函数应该失败。