pandas获取行值的索引

时间:2015-04-21 06:49:11

标签: python pandas

我正在使用以下数据框

 like            max_interest    min_interest
 basketball       4               2
 football         2               0
 soccer           4               2
 softball         4               2
 volleyball       4               2
 swimming         2               0
 cheerleading     4               2
 baseball         4               2

我想按照max_interest / min兴趣将其分组,如

  group         max_interest                                                  min_interest
      4         basketball,soccer,softball,volleyball,cheerleading,baseball   N/A   
      2         football,swimming                                             basketball,soccre,softball,volleyball,cheerleading,baseball
      0         N/A                                                           football,swimming

我尝试使用groupby(max_interest)使其工作,但未能找到如何连接like列

这基本上做的是将like的行值连接到max_interest标题下的字符串中,并且类似地用于mininterest。

可以通过编写iterateng的手动编码逻辑并继续添加喜欢的方式,但希望知道我是否可以使用pandas / np库编写它

帮助表示赞赏。

2 个答案:

答案 0 :(得分:0)

这是一个选项:

In [39]: def groupby(key):
   ....:         result = data.groupby(key).agg({'like': lambda v: ','.join(v)})
   ....:         result.index.name = 'group'
   ....:         result.columns = [key]
   ....:         return result
   ....:

In [40]: pd.concat((groupby(key) for key in ['max_interest', 'min_interest']), axis=1)
Out[40]:
                                            max_interest                                       min_interest
group
0                                                    NaN                                  football,swimming
2                                      football,swimming  basketball,soccer,softball,volleyball,cheerlea...
4      basketball,soccer,softball,volleyball,cheerlea...                                                NaN

答案 1 :(得分:0)

首先拆分DataFrame并根据兴趣级别连接相应的喜欢:

u = ({k: ','.join(n['like'])} for k, n in df.groupby('max_interest'))              
v = ({k: ','.join(n['like'])} for k, n in df.groupby('min_interest'))

然后创建一个新的DataFrame

df1 = pd.DataFrame(list(u)+list(v), index=['max_interest', 'max_interest', 'min_interest', 'min_interest']

将框架放在您想要的表格中,使用groupby().last()

adjustframe = df1.grouby(level=0).last().transpose()

输出:

                            max_interest                          min_interest                                                                      
0                                   NaN                             foot,swim                                                                      
2                             foot,swim  basket,soccer,soft,volley,cheer,base                                                                      
4  basket,soccer,soft,volley,cheer,base                                   NaN                                                                      

设置索引名称:

adjustframe.index.name = 'group'