在pyspark中添加具有随机数的列

时间:2020-05-08 18:48:07

标签: python pyspark

我想生成一个具有如下随机数的列:

df=df.withColumn("random_col",random.randint(100000, 1000000))

上面给我一个错误:

AssertionError:col应该是Column

1 个答案:

答案 0 :(得分:0)

首先,我将确保您已导入正确的内容...

尝试导入: 从pyspark.sql.functions导入rand

然后尝试执行以下代码:

df1 = df.withColumn(“ random_col”,rand()> 100000,1000000)

You also could check out this resource. It looks like it may be helpful for what you are doing

希望这会有所帮助!