使用NaT值排序Pandas数据帧

时间:2017-01-05 06:20:13

标签: python sorting pandas numpy

我试图在顶部用NaT值对pandas数据帧进行排序。我正在使用df.sort_values函数:

df=df.sort_values(by='date_of_last_hoorah_given')

它工作正常,我在底部

得到一个带有NaT值的排序数据框
    date_of_last_hoorah_given                              email   first_name  \
16 2016-12-19 07:36:08.000000              mindy.lyndi@hoorah.io        Mindy   
29 2016-12-19 07:36:08.000000              judi.seward@hoorah.io         Judi   
7  2016-12-19 07:36:08.000000                  chao.an@hoorah.io         Chao   
21 2016-12-19 07:36:08.000000              bala.harish@hoorah.io         Bala   
12 2016-12-19 07:36:08.000000            pushpa.swaran@hoorah.io       Pushpa   
30 2016-12-22 07:36:08.000000       sparrow.freespirit@hoorah.io      Sparrow   
28 2016-12-22 07:36:08.000000         sanjeev.prasanna@hoorah.io      Sanjeev   
27 2016-12-22 07:36:08.000000     twinklenose.snowleaf@hoorah.io  Twinklenose   
25 2016-12-22 07:36:08.000000       sweetgaze.sugarshy@hoorah.io    Sweetgaze   
23 2016-12-22 07:36:08.000000            shreya.sarika@hoorah.io       Shreya   
19 2016-12-22 07:36:08.000000              jiahao.dong@hoorah.io       Jiahao   
15 2016-12-22 07:36:08.000000            jannine.tyson@hoorah.io       Janine   
14 2016-12-22 07:36:08.000000                arlo.reed@hoorah.io         Arlo   
0  2016-12-22 07:36:08.000000         aditya.hariharan@hoorah.io       Aditya   
11 2016-12-22 07:36:08.000000        shirley.madalitso@hoorah.io      Shirley   
2  2016-12-22 07:36:08.000000             minerva.jena@hoorah.io     Minerva    
3  2016-12-22 07:36:08.000000             colby.brandi@hoorah.io        Colby   
13 2016-12-22 07:36:08.000000            beverly.cohen@hoorah.io      Beverly   
6  2016-12-22 07:36:08.000000             guanting.jun@hoorah.io     Guanting   
5  2016-12-22 07:36:08.000000                  chen.tu@hoorah.io         Chen   
18 2016-12-22 10:55:03.474683                  fen.lin@hoorah.io          Fen   
9  2016-12-23 07:36:08.000000             kourtney.pam@hoorah.io     Kourtney   
10 2016-12-23 14:30:55.206581             kailee.alfie@hoorah.io       Kailee   
4  2016-12-24 07:36:08.000000                jing.chao@hoorah.io        Jing    
31 2016-12-24 16:02:28.945809               rich.oswin@hoorah.io         Rich   
24 2016-12-25 07:36:08.000000           ganesh.vasanta@hoorah.io       Ganesh   
8  2016-12-26 07:36:08.000000               xia.yaling@hoorah.io          Xia   
20 2016-12-27 07:36:08.000000              kinley.joan@hoorah.io       Kinley   
22 2016-12-28 07:36:08.000000   honeygleam.dazzlesmile@hoorah.io   Honeygleam   
26 2016-12-28 15:29:48.629929             indira.padma@hoorah.io       Indira   
17 2016-12-29 02:27:11.125078             ileen.gaynor@hoorah.io        Ileen   
32 2016-12-29 15:38:02.335296            ragnar.lestat@hoorah.io       Ragnar   
1                         NaT  flitterbeam.clovergaze@hoorah.com  Flitterbeam   

但是当我尝试使用以下代码将其放在首位时:

df=df.sort_values(by='date_of_last_hoorah_given',ascending=[1,0])

我得到一个valueError:升序的长度(2)!= by的长度(1) 完整堆栈跟踪如下:

ValueError                                Traceback (most recent call last)
<ipython-input-107-948a8354aeeb> in <module>()
      1 cd = ClientData(1)
----> 2 cd.get_inactive_users()

<ipython-input-106-ed230054ea86> in get_inactive_users(self)
    346             inactive_users_result.append(user_dict)
    347         df=pd.DataFrame(inactive_users_result)
--> 348         df=df.sort_values(by='date_of_last_hoorah_given',ascending=[1,0])
    349         print(df)

C:\Users\aditya\Anaconda3\lib\site-packages\pandas\core\frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position)
   3126         if com.is_sequence(ascending) and len(by) != len(ascending):
   3127             raise ValueError('Length of ascending (%d) != length of by (%d)' %
-> 3128                              (len(ascending), len(by)))
   3129         if len(by) > 1:
   3130             from pandas.core.groupby import _lexsort_indexer

ValueError: Length of ascending (2) != length of by (1)

2 个答案:

答案 0 :(得分:4)

问题是NaT在排序时是最大的,因此总是最后的。为了在将np.lexsort放在前面或顶部时按升序排序,您需要按两个条件排序。

np.argsort将按任意数量的条件对数组进行排序,并返回类似于notnull的排序切片

另请注意,我会将np.lexsort条件放在传递给np.lexsort的条件数组的最后。 df.date_of_last_hoorah_given.notnull()首先对最后的元素进行排序......我不知道为什么,但这就是它的原因。

因此我们应首先按True排序,因为那些不为null的值的值False大于排序上下文中的dates = df.date_of_last_hoorah_given sort_slice = np.lexsort([dates.values, dates.notnull().values]) df.iloc[sort_slice] 。然后我们可以按其余的日期排序。

df.sort_values('date_of_last_hoorah_given', na_position='first')

     date_of_last_hoorah_given                              email   first_name
1                          NaT  flitterbeam.clovergaze@hoorah.com  Flitterbeam
16  2016-12-19 07:36:08.000000              mindy.lyndi@hoorah.io        Mindy
29  2016-12-19 07:36:08.000000              judi.seward@hoorah.io         Judi
7   2016-12-19 07:36:08.000000                  chao.an@hoorah.io         Chao
21  2016-12-19 07:36:08.000000              bala.harish@hoorah.io         Bala
12  2016-12-19 07:36:08.000000            pushpa.swaran@hoorah.io       Pushpa
30  2016-12-22 07:36:08.000000       sparrow.freespirit@hoorah.io      Sparrow
28  2016-12-22 07:36:08.000000         sanjeev.prasanna@hoorah.io      Sanjeev
27  2016-12-22 07:36:08.000000     twinklenose.snowleaf@hoorah.io  Twinklenose
25  2016-12-22 07:36:08.000000       sweetgaze.sugarshy@hoorah.io    Sweetgaze
23  2016-12-22 07:36:08.000000            shreya.sarika@hoorah.io       Shreya
19  2016-12-22 07:36:08.000000              jiahao.dong@hoorah.io       Jiahao
15  2016-12-22 07:36:08.000000            jannine.tyson@hoorah.io       Janine
14  2016-12-22 07:36:08.000000                arlo.reed@hoorah.io         Arlo
0   2016-12-22 07:36:08.000000         aditya.hariharan@hoorah.io       Aditya
11  2016-12-22 07:36:08.000000        shirley.madalitso@hoorah.io      Shirley
2   2016-12-22 07:36:08.000000             minerva.jena@hoorah.io      Minerva
3   2016-12-22 07:36:08.000000             colby.brandi@hoorah.io        Colby
13  2016-12-22 07:36:08.000000            beverly.cohen@hoorah.io      Beverly
6   2016-12-22 07:36:08.000000             guanting.jun@hoorah.io     Guanting
5   2016-12-22 07:36:08.000000                  chen.tu@hoorah.io         Chen
18  2016-12-22 10:55:03.474683                  fen.lin@hoorah.io          Fen
9   2016-12-23 07:36:08.000000             kourtney.pam@hoorah.io     Kourtney
10  2016-12-23 14:30:55.206581             kailee.alfie@hoorah.io       Kailee
4   2016-12-24 07:36:08.000000                jing.chao@hoorah.io         Jing
31  2016-12-24 16:02:28.945809               rich.oswin@hoorah.io         Rich
24  2016-12-25 07:36:08.000000           ganesh.vasanta@hoorah.io       Ganesh
8   2016-12-26 07:36:08.000000               xia.yaling@hoorah.io          Xia
20  2016-12-27 07:36:08.000000              kinley.joan@hoorah.io       Kinley
22  2016-12-28 07:36:08.000000   honeygleam.dazzlesmile@hoorah.io   Honeygleam
26  2016-12-28 15:29:48.629929             indira.padma@hoorah.io       Indira
17  2016-12-29 02:27:11.125078             ileen.gaynor@hoorah.io        Ileen
32  2016-12-29 15:38:02.335296            ragnar.lestat@hoorah.io       Ragnar

OR!正如OP在评论中所说,这给出了同样的东西,而且更直接

"image" => "nullable|required_without:content|image",
"content" => "nullable|required_without:image"

答案 1 :(得分:3)

您不能在ascending=[1,0]中使用2个值,因为只排序一列:

如果需要降序使用False,默认情况下为True

df=df.sort_values(by='date_of_last_hoorah_given',ascending=False)
print (df)
     date_of_last_hoorah_given                              email   first_name
1                          NaT  flitterbeam.clovergaze@hoorah.com  Flitterbeam
32  2016-12-29 15:38:02.335296            ragnar.lestat@hoorah.io       Ragnar
17  2016-12-29 02:27:11.125078             ileen.gaynor@hoorah.io        Ileen
26  2016-12-28 15:29:48.629929             indira.padma@hoorah.io       Indira
22  2016-12-28 07:36:08.000000   honeygleam.dazzlesmile@hoorah.io   Honeygleam
20  2016-12-27 07:36:08.000000              kinley.joan@hoorah.io       Kinley
8   2016-12-26 07:36:08.000000               xia.yaling@hoorah.io          Xia
24  2016-12-25 07:36:08.000000           ganesh.vasanta@hoorah.io       Ganesh
31  2016-12-24 16:02:28.945809               rich.oswin@hoorah.io         Rich
4   2016-12-24 07:36:08.000000                jing.chao@hoorah.io         Jing
10  2016-12-23 14:30:55.206581             kailee.alfie@hoorah.io       Kailee
9   2016-12-23 07:36:08.000000             kourtney.pam@hoorah.io     Kourtney
18  2016-12-22 10:55:03.474683                  fen.lin@hoorah.io          Fen
3   2016-12-22 07:36:08.000000             colby.brandi@hoorah.io        Colby
5   2016-12-22 07:36:08.000000                  chen.tu@hoorah.io         Chen
6   2016-12-22 07:36:08.000000             guanting.jun@hoorah.io     Guanting
13  2016-12-22 07:36:08.000000            beverly.cohen@hoorah.io      Beverly
2   2016-12-22 07:36:08.000000             minerva.jena@hoorah.io      Minerva
11  2016-12-22 07:36:08.000000        shirley.madalitso@hoorah.io      Shirley
0   2016-12-22 07:36:08.000000         aditya.hariharan@hoorah.io       Aditya
14  2016-12-22 07:36:08.000000                arlo.reed@hoorah.io         Arlo
15  2016-12-22 07:36:08.000000            jannine.tyson@hoorah.io       Janine
...
...

如果需要按2列排序,首先是升序,第二次是降序:

df=df.sort_values(by=['date_of_last_hoorah_given', 'email'],ascending=[True, False])

如果需要使用NaN升级排序,首先可能的解决方案是concat已拆分的DataFrame:

df.date_of_last_hoorah_given = pd.to_datetime(df.date_of_last_hoorah_given)
df=df.sort_values(by='date_of_last_hoorah_given')
mask = df.date_of_last_hoorah_given.isnull()
print (pd.concat([df[mask], df[~mask]]))
    date_of_last_hoorah_given                              email   first_name
1                         NaT  flitterbeam.clovergaze@hoorah.com  Flitterbeam
16 2016-12-19 07:36:08.000000              mindy.lyndi@hoorah.io        Mindy
29 2016-12-19 07:36:08.000000              judi.seward@hoorah.io         Judi
7  2016-12-19 07:36:08.000000                  chao.an@hoorah.io         Chao
21 2016-12-19 07:36:08.000000              bala.harish@hoorah.io         Bala
12 2016-12-19 07:36:08.000000            pushpa.swaran@hoorah.io       Pushpa
5  2016-12-22 07:36:08.000000                  chen.tu@hoorah.io         Chen
6  2016-12-22 07:36:08.000000             guanting.jun@hoorah.io     Guanting
13 2016-12-22 07:36:08.000000            beverly.cohen@hoorah.io      Beverly
3  2016-12-22 07:36:08.000000             colby.brandi@hoorah.io        Colby
11 2016-12-22 07:36:08.000000        shirley.madalitso@hoorah.io      Shirley
0  2016-12-22 07:36:08.000000         aditya.hariharan@hoorah.io       Aditya
14 2016-12-22 07:36:08.000000                arlo.reed@hoorah.io         Arlo
2  2016-12-22 07:36:08.000000             minerva.jena@hoorah.io      Minerva
19 2016-12-22 07:36:08.000000              jiahao.dong@hoorah.io       Jiahao
23 2016-12-22 07:36:08.000000            shreya.sarika@hoorah.io       Shreya
15 2016-12-22 07:36:08.000000            jannine.tyson@hoorah.io       Janine
25 2016-12-22 07:36:08.000000       sweetgaze.sugarshy@hoorah.io    Sweetgaze
27 2016-12-22 07:36:08.000000     twinklenose.snowleaf@hoorah.io  Twinklenose
28 2016-12-22 07:36:08.000000         sanjeev.prasanna@hoorah.io      Sanjeev
30 2016-12-22 07:36:08.000000       sparrow.freespirit@hoorah.io      Sparrow
18 2016-12-22 10:55:03.474683                  fen.lin@hoorah.io          Fen
9  2016-12-23 07:36:08.000000             kourtney.pam@hoorah.io     Kourtney
10 2016-12-23 14:30:55.206581             kailee.alfie@hoorah.io       Kailee
4  2016-12-24 07:36:08.000000                jing.chao@hoorah.io         Jing
...
...