如果同一组给予相同的号码

时间:2019-06-10 12:52:59

标签: python pyspark aws-glue

我有这样的数据框 名称状态

+----+------+                      
|name|value |                                                  
+----+------+                                   
|  x |  down|                                             
|  y |normal|                               
|  z |  down|                                                
|  x |normal|                                  
|  y |  down|                       
+----+------+ 

如果名称相同,我想这样输入1,2,3,则新列必须这样

+----+------+------+   
|name|value |newCol|   
+----+------+------+   
|   x|down  |     1|   
|   y|normal|     2|   
|   z|down  |     3|   
|   x|normal|     1|    
|   y|down  |     2|   
+----+------+------+
win = Window.partitionBy("name").orderBy("name")
print("value")
dp_df_classification_agg_join = dp_df_classification_agg_join.withColumn("newCol",count("name").over(win))

1 个答案:

答案 0 :(得分:0)

首先,将count("name")函数替换为dense_rank()函数。

然后,将此win = Window.partitionBy("name").orderBy("name")替换为win = Window.partitionBy().orderBy("name")