Question

我有一个.xlsx文件，包含9个898个观察值的变量。我读了一个.xlsx文件并解析成了一只大熊猫dataframe。我尝试按升序排序名为dataframe的pandas product_id列，但得到了结果中的所有列。

我按照另一个link的建议，但仍然出错。

问题：如何按升序排列product_id类别中排名前10位的最高值？

import pandas as pd
import xlrd 
#Import data
trans = pd.ExcelFile('file.xlsx')
#parse xlsx file into dataframe
transdata = trans.parse('Orders')
#view head of dataframe
print transdata.head()

   site_id  visitor_id  transaction_id transaction_date  product_id  price  \
0        3       10001           20001       2014-10-31       48165    150   
1        3       10002           20002       2014-10-31       48162    128   
2        3       10002           20003       2014-10-30       48165    150   
3        3       10003           20004       2014-10-31       48815     98   
4        3       10003           20005       2014-10-29       48165    150   

   units  sales_tax   total  
0      1      12.38  162.38  
1      1      10.56  138.56  
2      1      12.38  162.38  
3      1       8.09  106.09  
4      1      12.38  162.38  

grouped = transdata.groupby(['product_id']).size()
print grouped

product_id
36959          78
44524          12
45956          33
46814          11
48162          50
48165         100
48412          12
48478          23
48500          13
48528          14
48552         101
48587         106
48593         104
48628           4
48810          25
48814          16
48815          33
48823          20
49418          11
49444          12
49882         102
51184           2
51380          15
dtype: int64

编辑：我尝试对pandas数据框类别product_id进行排序，但获得了所有列的排名。

grouped = transdata.groupby(['product_id'])
counts = grouped.size().sort()
result = counts.head(10).index
print result

   site_id  visitor_id  transaction_id transaction_date  product_id  price  \
0        3       10001           20001       2014-10-31       48165    150   
1        3       10002           20002       2014-10-31       48162    128   
2        3       10002           20003       2014-10-30       48165    150   
3        3       10003           20004       2014-10-31       48815     98   
4        3       10003           20005       2014-10-29       48165    150   

   units  sales_tax   total  
0      1      12.38  162.38  
1      1      10.56  138.56  
2      1      12.38  162.38  
3      1       8.09  106.09  
4      1      12.38  162.38  
Traceback (most recent call last):
  File "Trending.py", line 14, in <module>
    result = counts.head(10).index
AttributeError: 'NoneType' object has no attribute 'head'

所需输出：product_id类别中出现最高值的向量。

Answer 1

由此：

grouped = transdata.groupby(by=['product_id'])

你只需要

counts = grouped.size()['product_id']
counts.sort()
counts.head(10)

按类别对Pandas Dataframe进行排序

1 个答案: