Pandas将一个成员的价值分配给所有其他成员

时间:2017-08-03 07:17:08

标签: python pandas

我已经搜索了这个,但仍然无法让我的团队成员,所以......

数据(dataFrame)看起来像这样(*表示所需的输出):

   id  parentid page_number is_critical_page page_number_of_critical* page_numbers_not_critical*
    0   1  1  1 True   1   2,3,4,5
    1   2  1  2 False  1   2,3,4,5
    2   3  1  3 False  1   2,3,4,5
    3   4  1  4 False  1   2,3,4,5
    4   5  1  5 False  1   2,3,4,5
    5   6  2  1 False  2   1,3
    6   7  2  2 True   2   1,3
    7   8  2  3 False  2   1,3
    8   9  3  1 False  -1  1
    9   10 4  1 True   1  -1

我想:

  1. parentid分组行:

    dgroups=dataFrame.groupby('parentid')
    
  2. 将任意操作应用于组:

    def func(grp):
        grp['has_critical_page'] = grp['is_critical_page'].sum()>0 # simple operation
        ### Apply operation here to generate:
        ### ?? grp['page_number_of_critical*'] = ... ??  # is a scalar
        ### ?? grp['page_numbers_not_critical'] = ... ?? # is a list
        return grp
    
    dgroups.apply(func)
    
    print dgroups.describe()
    
  3. -1代表N / - 可以是NaNNone-99或任何其他特殊值。

    我不确定是使用applytransformfilter等,还是将(..)func应用于{{1}行或者那些小组。

    当然试图避免循环....谢谢!

    PS Bonus指出如何处理群组中dataFrame的多次点击...

1 个答案:

答案 0 :(得分:4)

其中一种方法是创建字典并对其进行映射,您可以将page_number转换为字符串,然后在创建字典时将它们连接起来然后映射字典,即

df['page_number'] = df['page_number'].astype(str)
critical_pages=df[df.is_critical_page]
not_critical_pages=df[~df.is_critical_page]

not_critical_pages = not_critical_pages.groupby('parentid')['page_number'].apply(','.join).to_dict()
critical_pages = critical_pages.groupby('parentid')['page_number'].apply(','.join).to_dict()

df['page_number_of_critical*'] = df['parentid'].map(critical_pages)
df['not_page_number_of_critical*'] = df['parentid'].map(not_critical_pages)

输出:

   id  parentid page_number  is_critical_page page_number_of_critical*  \
0   1         1           1              True                        1   
1   2         1           2             False                        1   
2   3         1           3             False                        1   
3   4         1           4             False                        1   
4   5         1           5             False                        1   
5   6         2           1             False                        2   
6   7         2           2              True                        2   
7   8         2           3             False                        2   
8   9         3           1             False                      NaN   
9  10         4           1              True                        1   

  not_page_number_of_critical*  
0                      2,3,4,5  
1                      2,3,4,5  
2                      2,3,4,5  
3                      2,3,4,5  
4                      2,3,4,5  
5                          1,3  
6                          1,3  
7                          1,3  
8                            1  
9                          NaN  

您可以使用fillna将其填入所需的值。

您也可以使用申请即

df['page_number'] = df['page_number'].astype(str)

crn_pages = df.groupby(['parentid','is_critical_page'])['page_number'].apply(','.join).to_dict()

df['page_number_of_critical*'] = df.apply(lambda x: crn_pages[x['parentid'],True] if (x['parentid'],True) in crn_pages else -1 ,axis=1)
df['not_page_number_of_critical*'] = df.apply(lambda x: crn_pages[x['parentid'],False] if (x['parentid'],False) in crn_pages else -1 ,axis=1)

希望有所帮助