我已经搜索了这个,但仍然无法让我的团队成员,所以......
数据(dataFrame
)看起来像这样(*
表示所需的输出):
id parentid page_number is_critical_page page_number_of_critical* page_numbers_not_critical* 0 1 1 1 True 1 2,3,4,5 1 2 1 2 False 1 2,3,4,5 2 3 1 3 False 1 2,3,4,5 3 4 1 4 False 1 2,3,4,5 4 5 1 5 False 1 2,3,4,5 5 6 2 1 False 2 1,3 6 7 2 2 True 2 1,3 7 8 2 3 False 2 1,3 8 9 3 1 False -1 1 9 10 4 1 True 1 -1
我想:
按parentid
分组行:
dgroups=dataFrame.groupby('parentid')
将任意操作应用于组:
def func(grp):
grp['has_critical_page'] = grp['is_critical_page'].sum()>0 # simple operation
### Apply operation here to generate:
### ?? grp['page_number_of_critical*'] = ... ?? # is a scalar
### ?? grp['page_numbers_not_critical'] = ... ?? # is a list
return grp
dgroups.apply(func)
print dgroups.describe()
-1
代表N / - 可以是NaN
,None
,-99
或任何其他特殊值。
我不确定是使用apply
,transform
,filter
等,还是将(..)func
应用于{{1}行或者那些小组。
当然试图避免循环....谢谢!
PS Bonus指出如何处理群组中dataFrame
的多次点击...
答案 0 :(得分:4)
其中一种方法是创建字典并对其进行映射,您可以将page_number转换为字符串,然后在创建字典时将它们连接起来然后映射字典,即
df['page_number'] = df['page_number'].astype(str)
critical_pages=df[df.is_critical_page]
not_critical_pages=df[~df.is_critical_page]
not_critical_pages = not_critical_pages.groupby('parentid')['page_number'].apply(','.join).to_dict()
critical_pages = critical_pages.groupby('parentid')['page_number'].apply(','.join).to_dict()
df['page_number_of_critical*'] = df['parentid'].map(critical_pages)
df['not_page_number_of_critical*'] = df['parentid'].map(not_critical_pages)
输出:
id parentid page_number is_critical_page page_number_of_critical* \ 0 1 1 1 True 1 1 2 1 2 False 1 2 3 1 3 False 1 3 4 1 4 False 1 4 5 1 5 False 1 5 6 2 1 False 2 6 7 2 2 True 2 7 8 2 3 False 2 8 9 3 1 False NaN 9 10 4 1 True 1 not_page_number_of_critical* 0 2,3,4,5 1 2,3,4,5 2 2,3,4,5 3 2,3,4,5 4 2,3,4,5 5 1,3 6 1,3 7 1,3 8 1 9 NaN
您可以使用fillna
将其填入所需的值。
您也可以使用申请即
df['page_number'] = df['page_number'].astype(str)
crn_pages = df.groupby(['parentid','is_critical_page'])['page_number'].apply(','.join).to_dict()
df['page_number_of_critical*'] = df.apply(lambda x: crn_pages[x['parentid'],True] if (x['parentid'],True) in crn_pages else -1 ,axis=1)
df['not_page_number_of_critical*'] = df.apply(lambda x: crn_pages[x['parentid'],False] if (x['parentid'],False) in crn_pages else -1 ,axis=1)
希望有所帮助