如何使用pandas返回前10个频繁列值?

时间:2016-07-11 02:37:50

标签: python python-3.x csv pandas

我正在玩着名的犯罪数据集。它看起来像这样:

Dates,Category,Descript,DayOfWeek,PdDistrict,Resolution,Address,X,Y,Time
2015-05-13,VANDALISM,"MALICIOUS MISCHIEF, VANDALISM OF VEHICLES",Wednesday,TENDERLOIN,NONE,TURK ST / JONES ST,-122.41241426358101,37.7830037964534,22:30:00
2015-05-13,VANDALISM,"MALICIOUS MISCHIEF, VANDALISM",Wednesday,NORTHERN,NONE,1500 Block of FILLMORE ST,-122.432743822617,37.7838424505847,20:45:00
2015-05-13,VANDALISM,"MALICIOUS MISCHIEF, VANDALISM",Wednesday,NORTHERN,NONE,1100 Block of FILLMORE ST,-122.431979576386,37.7800478529923,17:07:00
2015-05-13,VANDALISM,"MALICIOUS MISCHIEF, VANDALISM OF VEHICLES",Wednesday,TENDERLOIN,NONE,LEAVENWORTH ST / EDDY ST,-122.414242955907,37.783724025447796,17:00:00
2015-05-13,VANDALISM,"MALICIOUS MISCHIEF, VANDALISM OF VEHICLES",Wednesday,CENTRAL,NONE,CALIFORNIA ST / STOCKTON ST,-122.40753977435699,37.79224917725779,16:45:00
2015-05-13,VANDALISM,"MALICIOUS MISCHIEF, VANDALISM",Wednesday,BAYVIEW,NONE,100 Block of KISKA RD,-122.375989158092,37.7301576924252,16:00:00
2015-05-13,VANDALISM,"MALICIOUS MISCHIEF, VANDALISM OF VEHICLES",Wednesday,NORTHERN,"ARREST, BOOKED",300 Block of MCALLISTER ST,-122.417777932619,37.7803089893403,14:30:00
2015-05-13,NON-CRIMINAL,LOST PROPERTY,Wednesday,TENDERLOIN,NONE,300 Block of OFARRELL ST,-122.41050925879499,37.786043222299206,21:00:00
2015-05-13,LARCENY/THEFT,GRAND THEFT FROM LOCKED AUTO,Wednesday,NORTHERN,NONE,2000 Block of BUSH ST,-122.43101755702699,37.7873880712241,21:00:00
.....
2015-05-13,LARCENY/THEFT,GRAND THEFT FROM LOCKED AUTO,Wednesday,INGLESIDE,NONE,500 Block of COLLEGE AV,-122.42365634294501,37.7325564882065,21:00:00
2015-05-13,LARCENY/THEFT,ATTEMPTED THEFT FROM LOCKED VEHICLE,Wednesday,TARAVAL,NONE,19TH AV / SANTIAGO 

当我获得Dates列的频率计数时,我得到2011-01-01 650。换句话说,整个数据集中650发生了2011-01-01次犯罪。不过,我想知道如何返回Category中发生的650次犯罪的前10个类别(2011-01-01列)。从documentation我读到有关索引选择和切片的内容。尽管如此,我仍然不知道如何返回这些类别。

1 个答案:

答案 0 :(得分:1)

我认为这样做你想要的,首先用df.Dates == "2011-01-01"构造一个逻辑索引来过滤日期2011-01-01上的行,并在列索引处指定Category以仅选择{{1 }}列,因此您获得Category上的所有类别。使用2011-01-01函数为每个类别创建一个频率表,并按默认按升序排序的频率排序,为了获得最常见的类别,您可以使用列表value_counts()反向索引扭转频率计数并使用[::-1]来获取前10个最常见类别的前10个元素:

[:10]