Python pandas dataframe sort_values不适用于第二学期

时间:2017-05-29 13:40:28

标签: python-2.7 sorting pandas dataframe

我有一个包含销售数据的数据框

Order ID Order Date Order Priority  Order Quantity     Sales  
  928.0   1/1/2009           High            32.0    180.36 
10369.0   1/2/2009            Low            43.0  4,083.19
10144.0   1/2/2009       Critical            16.0    137.63             
32323.0   1/1/2009  Not Specified             9.0    872.48          
48353.0   1/2/2009       Critical             3.0    124.81         
51008.0   1/3/2009       Critical            15.0     85.56       
26756.0   1/2/2009       Critical            43.0     614.8         
18144.0   1/2/2009            Low             4.0  1,239.06         
22912.0   1/2/2009            Low            32.0  4,902.38
... 

我想按日期(从最旧到最新)和销售(从大到小)对值进行排序。我在PyCharm Edu 3.5.1(python 2.7)中编写了这段代码:

df = pd.read_csv('sales.csv', header=0)
df['Order Date'] = pd.to_datetime(df['Order Date'])
df = df.sort_values(by=['Order Date', 'Sales'], ascending=[True, False])
print df.head(10)

输出:

    Order ID Order Date Order Priority  Order Quantity      Sales  
    32323.0 2009-01-01  Not Specified             9.0     872.48          
    928.0   2009-01-01           High            32.0     180.36          
    26756.0 2009-01-02       Critical            43.0      614.8         
    22912.0 2009-01-02            Low            32.0   4,902.38         
    10369.0 2009-01-02            Low            43.0   4,083.19          
    10144.0 2009-01-02       Critical            16.0     137.63          
    48353.0 2009-01-02       Critical             3.0     124.81          
    18144.0 2009-01-02            Low             4.0   1,239.06         
    29376.0 2009-01-03  Not Specified             4.0     896.49
...

'订单日期'列已正确排序,但'销售'列未按预期排序。对于1000分隔符,似乎PyCharm忽略了值。我在这里错过了什么吗?

1 个答案:

答案 0 :(得分:1)

使用带有参数thousands的{​​{3}}来移除浮点数中的,,将parse_dates用于将列转换为日期时间,因为列Sales的值读为{ {1}} S:

string

另一种解决方案是使用read_csv + replaceastype

df = pd.read_csv('sales.csv', thousands=',', parse_dates=['Order Date'])
print (df)
   Order ID Order Date Order Priority  Order Quantity    Sales
0     928.0 2009-01-01           High            32.0   180.36
1   10369.0 2009-01-02            Low            43.0  4083.19
2   10144.0 2009-01-02       Critical            16.0   137.63
3   32323.0 2009-01-01  Not Specified             9.0   872.48
4   48353.0 2009-01-02       Critical             3.0   124.81
5   51008.0 2009-01-03       Critical            15.0    85.56
6   26756.0 2009-01-02       Critical            43.0   614.80
7   18144.0 2009-01-02            Low             4.0  1239.06
8   22912.0 2009-01-02            Low            32.0  4902.38

df = df.sort_values(by=['Order Date', 'Sales'], ascending=[True, False])
print (df)
   Order ID Order Date Order Priority  Order Quantity    Sales
3   32323.0 2009-01-01  Not Specified             9.0   872.48
0     928.0 2009-01-01           High            32.0   180.36
8   22912.0 2009-01-02            Low            32.0  4902.38
1   10369.0 2009-01-02            Low            43.0  4083.19
7   18144.0 2009-01-02            Low             4.0  1239.06
6   26756.0 2009-01-02       Critical            43.0   614.80
2   10144.0 2009-01-02       Critical            16.0   137.63
4   48353.0 2009-01-02       Critical             3.0   124.81
5   51008.0 2009-01-03       Critical            15.0    85.56