首先按特定字段值排序 pyspark

时间:2021-05-15 10:58:32

标签: python apache-spark pyspark

id | name | priority
--------------------
 1 | core  |   10   
 2 | core  |   9    
 3 | other |   8    
 4 | board |   7    
 5 | board |   6    
 6 | core  |   4    

我想使用优先级对结果集进行排序,但首先是那些具有 name=core 的行,即使优先级较低。结果应该是这样的

id | name | priority
--------------------
 6 | core  |   4    
 2 | core  |   9    
 1 | core  |   10   
 5 | board |   6    
 4 | board |   7    
 3 | other |   8    

1 个答案:

答案 0 :(得分:3)

您可以通过检查名称是否等于 core 的布尔值进行排序:

import pyspark.sql.functions as F

df.orderBy(F.col('name') != 'core', 'priority').show()
+---+-----+--------+
| id| name|priority|
+---+-----+--------+
|  6| core|       4|
|  2| core|       9|
|  1| core|      10|
|  5|board|       6|
|  4|board|       7|
|  3|other|       8|
+---+-----+--------+