我的数据看起来像
Name,Report_ID,Amount,Flag,Actions
Fizz,123,5,,A
Fizz,123,10,Y,A
Buzz,456,10,,B
Buzz,456,40,,C
Buzz,456,70,,D
Bazz,678,100,Y,F
从这些单独的操作中,我想创建一个捕获各种统计/元名称的新数据框。主要是总结和项目数/唯一条目的数量。我希望数据框的输出如下所示:
Report_ID,Number of Flags,Number of Entries, Total,Unique Actions
123,1,2,15,1
456,0,3,120,3
678,1,1,100,1
我已尝试使用groupby
,但我无法将所有单个groupby对象正确地合并在一起。到目前为止,我已经尝试了
totals = raw_data.groupby('Report_ID')['Amount'].sum()
event_count = raw_data.groupby('Report_ID').size()
num_actions = raw_data.groupby('Report_ID').Actions.nunique()
output = pd.concat([totals,event_count,num_actions])
当我尝试这个时,我得到TypeError: cannot concatenate a non-NDFrame object
。任何帮助,将不胜感激!
答案 0 :(得分:1)
您可以在agg
groupby
f = dict(Flag=['count', 'size'], Amount='sum', Actions='nunique')
df.groupby('Report_ID').agg(f)
Flag Amount Actions
count size sum nunique
Report_ID
123 1 2 15 1
456 0 3 120 3
678 1 1 100 1
答案 1 :(得分:0)
连接时只需指定axis=1
:
event_count.name = 'Event Count' # Name the Series, as you did not group on one.
>>> pd.concat([totals, event_count, num_actions], axis=1)
Amount Event Count Actions
Report_ID
123 15 2 1
456 120 3 3
678 100 1 1