Python - 对一周中所有日期的数据进行排序

时间:2015-11-25 13:14:17

标签: python sorting pandas dataframe

我有一个包含DescriptionDayCount列的数据框。

我申请了

df.sort_values(['Count'], ascending = False)

在其上产生以下输出:

     Count        Day                       Error Description
261   4846   Thursday          N25846 External EMERGENCY STOP
263   3993  Wednesday          N25846 External EMERGENCY STOP
257   3303     Friday          N25846 External EMERGENCY STOP
504   3227  Wednesday                        N63 Handwheel? C
795   2954   Thursday       P873 ENCLOSURE DOOR CAN BE OPENED
797   2778  Wednesday       P873 ENCLOSURE DOOR CAN BE OPENED
791   2644     Friday       P873 ENCLOSURE DOOR CAN BE OPENED
796   2633    Tuesday       P873 ENCLOSURE DOOR CAN BE OPENED
262   2480    Tuesday          N25846 External EMERGENCY STOP
501   2157     Monday                        N63 Handwheel? C
601   2130   Thursday                P124 Magazine is running
597   2130     Friday                P124 Magazine is running
793   2047   Saturday       P873 ENCLOSURE DOOR CAN BE OPENED
503   1983    Tuesday                        N63 Handwheel? C
599   1961   Saturday                P124 Magazine is running
602   1921    Tuesday                P124 Magazine is running
792   1900     Monday       P873 ENCLOSURE DOOR CAN BE OPENED
603   1865  Wednesday                P124 Magazine is running
502   1705   Saturday                        N63 Handwheel? C

我想知道是否有办法对数据框进行排序,以便在一周中的所有日期显示分配顶级错误。 预期输出为:(假设N25846 External EMERGENCY STOP是最高错误,后跟N63 Handwheel? C,依此类推......)

     Count        Day                       Error Description
261   4846   Thursday          N25846 External EMERGENCY STOP
263   3993  Wednesday          N25846 External EMERGENCY STOP
257   3303     Friday          N25846 External EMERGENCY STOP
262   2480    Tuesday          N25846 External EMERGENCY STOP
504   3227  Wednesday                        N63 Handwheel? C
501   2157     Monday                        N63 Handwheel? C
503   1983    Tuesday                        N63 Handwheel? C
502   1705   Saturday                        N63 Handwheel? C
795   2954   Thursday       P873 ENCLOSURE DOOR CAN BE OPENED
797   2778  Wednesday       P873 ENCLOSURE DOOR CAN BE OPENED
791   2644     Friday       P873 ENCLOSURE DOOR CAN BE OPENED
796   2633    Tuesday       P873 ENCLOSURE DOOR CAN BE OPENED
793   2047   Saturday       P873 ENCLOSURE DOOR CAN BE OPENED
792   1900     Monday       P873 ENCLOSURE DOOR CAN BE OPENED
601   2130   Thursday                P124 Magazine is running
597   2130     Friday                P124 Magazine is running
599   1961   Saturday                P124 Magazine is running
602   1921    Tuesday                P124 Magazine is running
603   1865  Wednesday                P124 Magazine is running

2 个答案:

答案 0 :(得分:1)

您可以使用分组依据,然后是联接,然后进行排序。例如:

totals = df.groupby('Error Description').sum()

joined = df.join(totals, on='Error Description', rsuffix='_total')

result = joined.sort_values(['Count_total', 'Count'], ascending=False)

输出

>>> print(result)

     Count        Day                  Error Description  Count_total
795   2954   Thursday  P873 ENCLOSURE DOOR CAN BE OPENED        14956
797   2778  Wednesday  P873 ENCLOSURE DOOR CAN BE OPENED        14956
791   2644     Friday  P873 ENCLOSURE DOOR CAN BE OPENED        14956
796   2633    Tuesday  P873 ENCLOSURE DOOR CAN BE OPENED        14956
793   2047   Saturday  P873 ENCLOSURE DOOR CAN BE OPENED        14956
792   1900     Monday  P873 ENCLOSURE DOOR CAN BE OPENED        14956
261   4846   Thursday     N25846 External EMERGENCY STOP        14622
263   3993  Wednesday     N25846 External EMERGENCY STOP        14622
257   3303     Friday     N25846 External EMERGENCY STOP        14622
262   2480    Tuesday     N25846 External EMERGENCY STOP        14622
601   2130   Thursday           P124 Magazine is running        10007
597   2130     Friday           P124 Magazine is running        10007
599   1961   Saturday           P124 Magazine is running        10007
602   1921    Tuesday           P124 Magazine is running        10007
603   1865  Wednesday           P124 Magazine is running        10007
504   3227  Wednesday                   N63 Handwheel? C         9072
501   2157     Monday                   N63 Handwheel? C         9072
503   1983    Tuesday                   N63 Handwheel? C         9072
502   1705   Saturday                   N63 Handwheel? C         9072

请注意,这与您的示例输出不匹配,但它是按错误总计数排序的正确顺序。看看你的示例输出,也许你想要的是用sum()替换上面的max() - 我的问题并不完全清楚。

答案 1 :(得分:1)

与@Jakevdp类似,请考虑使用groupby.apply()函数在每个错误描述中创建最大计数的新列。然后,按它和错误排序。

def maxcount(row):
    row['MaxCount'] = row['Count'].max()
    return row

df = df.groupby(['Error Description']).apply(maxcount) \
               .sort(['MaxCount', 'Error Description'], ascending=[0,0])

<强>输出

Count        Day                  Error Description  MaxCount
 4846   Thursday     N25846 External EMERGENCY STOP      4846
 3993  Wednesday     N25846 External EMERGENCY STOP      4846
 3303     Friday     N25846 External EMERGENCY STOP      4846
 2480    Tuesday     N25846 External EMERGENCY STOP      4846
 3227  Wednesday                   N63 Handwheel? C      3227
 2157     Monday                   N63 Handwheel? C      3227
 1983    Tuesday                   N63 Handwheel? C      3227
 1705   Saturday                   N63 Handwheel? C      3227
 2954   Thursday  P873 ENCLOSURE DOOR CAN BE OPENED      2954
 2778  Wednesday  P873 ENCLOSURE DOOR CAN BE OPENED      2954
 2644     Friday  P873 ENCLOSURE DOOR CAN BE OPENED      2954
 2633    Tuesday  P873 ENCLOSURE DOOR CAN BE OPENED      2954
 2047   Saturday  P873 ENCLOSURE DOOR CAN BE OPENED      2954
 1900     Monday  P873 ENCLOSURE DOOR CAN BE OPENED      2954
 2130   Thursday           P124 Magazine is running      2130
 2130     Friday           P124 Magazine is running      2130
 1961   Saturday           P124 Magazine is running      2130
 1921    Tuesday           P124 Magazine is running      2130
 1865  Wednesday           P124 Magazine is running      2130