在pandas中的数据框架中创建一个组

时间:2017-04-05 09:14:09

标签: python pandas dataframe

我有一个列表,如

groups = [['Group1', 'A', 'B'], ['Group2', 'C', 'D']]

和一个数据框,如

A 100
B 200
C 300
D 400

我想从上面的列表中得到一个小组总和成为:

Group 1 300
Group 2 700

我怎么能用python pandas做到这一点? 不用说我是熊猫的新手。感谢。

3 个答案:

答案 0 :(得分:4)

您需要按$('#calendar').fullCalendar({ header: { left: 'prev,next today', center: 'title', right: 'month,agendaWeek,agendaDay' }, events: _this.state.events, defaultView:'month', displayEventTime: false, editable: false, droppable: false, durationEditable: false }); 然后groupby创建viewRender: function(view, element) { //note: this is a hack, i don't know why the view title keep showing "undefined" text in it. //probably bugs in jquery fullcalendar $('.fc-center')[0].children[0].innerText = view.title.replace(new RegExp("undefined", 'g'), ""); ; }, 并汇总[{"start":"2017-03-24T00:00:00.000Z","end":"2017-03-26T00:00:00.000Z","title":"Open house","description":"Bali 1 open house"}]

dict

可能是一点修改解决方案 - 如果只有lists列由sum聚合。最后reset_index用于将索引转换为列。

df = pd.DataFrame({'a': ['A', 'B', 'C', 'D'], 'b': [100, 200, 300, 400]})
print (df)
   a    b
0  A  100
1  B  200
2  C  300
3  D  400

groups = [['Group1', 'A', 'B'], ['Group2', 'C', 'D']]

#http://stackoverflow.com/q/43227103/2901002
d = {k:row[0] for row in groups for k in row[1:]}
print (d)
{'B': 'Group1', 'C': 'Group2', 'D': 'Group2', 'A': 'Group1'}

print (df.set_index('a').groupby(d).sum())
          b
Group1  300
Group2  700

答案 1 :(得分:1)

另一种选择......但似乎@ jezrael的方式更好!

import pandas as pd

groups = [['Group1', 'A', 'B'], ['Group2', 'C', 'D']]

df0 = pd.melt(pd.DataFrame(groups).set_index(0).T)
df1 = pd.read_clipboard(header=None)  # Your example data

df = df1.merge(df0, left_on=0, right_on='value')[['0_y', 1]]
df.columns = ['Group', 'Value']

print df.groupby('Group').sum()


        Value
Group        
Group1    300
Group2    700

答案 2 :(得分:1)

使用python 3解包和理解来创建字典。在第一列的地图中使用该字典。使用该映射分组。

考虑列表groups和数据框df

groups = [['Group1', 'A', 'B'], ['Group2', 'C', 'D']]
df = pd.DataFrame(dict(a=list('ABCD'), b=range(100, 401, 100)))

然后:

df.groupby(df.a.map({k: g for g, *c in groups for k in c})).sum()

          b
a          
Group1  300
Group2  700