Question

我正在为pyspark dataframe列进行AES加密。

我正在迭代列数据，并使用df.withcolumn将列值替换为加密值，但这太慢了

我正在寻找替代方法，但是我什么也没得到

'''
for i in column_data:   
 obj= AES.new(key, AES.MODE_CBC,v)   
 ciphertext= obj.encrypt(i)

 df=df.withColumn(col,F.when(df[col]==i,str(ciphertext)).otherwise(df[col])) return df
'''

但是要花很长时间。

您能否建议其他选择

Answer 1

由于您的for循环，您的代码很慢，因为它强制Spark仅在一个线程上运行。

请提供输入和预期输出的示例，也许有人可以帮助您重写代码。

当我遍历pyspark数据框中的列数据时，df.withcolumn太慢

1 个答案: