在“数据”数据框中,我有2列“ time_stamp”和“ hour”。我想在缺少“ time_stamp”值的地方插入“小时”列值。我不想创建新列,而是填写“ time_stamp”中的缺失值
我想做的是将这个熊猫代码替换为pyspark代码:
data['time_stamp'] = data.apply(lambda x: x['hour'] if pd.isna(x['time_stamp']) else x['time_stamp'], axis=1)
答案 0 :(得分:1)
类似的事情应该起作用
from pyspark.sql import functions as f
df = (df.withColumn('time_stamp',
f.expr('case when time_stamp is null then hour else timestamp'))) #added ) which you mistyped
或者,如果您不喜欢sql:
df = df.withColumn('time_stamp', f.when(f.col('time_stamp').isNull(),f.col('hour'))).otherwise(f.col('timestamp')) # Please correct the Brackets