我有这样的数据框 名称状态
+----+------+
|name|value |
+----+------+
| x | down|
| y |normal|
| z | down|
| x |normal|
| y | down|
+----+------+
如果名称相同,我想这样输入1,2,3,则新列必须这样
+----+------+------+
|name|value |newCol|
+----+------+------+
| x|down | 1|
| y|normal| 2|
| z|down | 3|
| x|normal| 1|
| y|down | 2|
+----+------+------+
win = Window.partitionBy("name").orderBy("name")
print("value")
dp_df_classification_agg_join = dp_df_classification_agg_join.withColumn("newCol",count("name").over(win))
答案 0 :(得分:0)
首先,将count("name")
函数替换为dense_rank()
函数。
然后,将此win = Window.partitionBy("name").orderBy("name")
替换为win = Window.partitionBy().orderBy("name")