我正在使用以下数据框
like max_interest min_interest
basketball 4 2
football 2 0
soccer 4 2
softball 4 2
volleyball 4 2
swimming 2 0
cheerleading 4 2
baseball 4 2
我想按照max_interest / min兴趣将其分组,如
group max_interest min_interest
4 basketball,soccer,softball,volleyball,cheerleading,baseball N/A
2 football,swimming basketball,soccre,softball,volleyball,cheerleading,baseball
0 N/A football,swimming
我尝试使用groupby(max_interest)使其工作,但未能找到如何连接like列
这基本上做的是将like的行值连接到max_interest标题下的字符串中,并且类似地用于mininterest。
可以通过编写iterateng的手动编码逻辑并继续添加喜欢的方式,但希望知道我是否可以使用pandas / np库编写它
帮助表示赞赏。
答案 0 :(得分:0)
这是一个选项:
In [39]: def groupby(key):
....: result = data.groupby(key).agg({'like': lambda v: ','.join(v)})
....: result.index.name = 'group'
....: result.columns = [key]
....: return result
....:
In [40]: pd.concat((groupby(key) for key in ['max_interest', 'min_interest']), axis=1)
Out[40]:
max_interest min_interest
group
0 NaN football,swimming
2 football,swimming basketball,soccer,softball,volleyball,cheerlea...
4 basketball,soccer,softball,volleyball,cheerlea... NaN
答案 1 :(得分:0)
首先拆分DataFrame
并根据兴趣级别连接相应的喜欢:
u = ({k: ','.join(n['like'])} for k, n in df.groupby('max_interest'))
v = ({k: ','.join(n['like'])} for k, n in df.groupby('min_interest'))
然后创建一个新的DataFrame
:
df1 = pd.DataFrame(list(u)+list(v), index=['max_interest', 'max_interest', 'min_interest', 'min_interest']
将框架放在您想要的表格中,使用groupby().last()
adjustframe = df1.grouby(level=0).last().transpose()
输出:
max_interest min_interest
0 NaN foot,swim
2 foot,swim basket,soccer,soft,volley,cheer,base
4 basket,soccer,soft,volley,cheer,base NaN
设置索引名称:
adjustframe.index.name = 'group'