我有一个包含900亿笔交易记录的数据框。数据框看起来像-
id marital_status age new_class_desc is_child
1 Married 35 kids_sec 0
2 Single 28 Other 1
3 Married 32 Other 1
5 Married 42 kids_sec 0
2 Single 28 Other 1
7 Single 27 kids_sec 0
我希望数据框看起来像-
id marital_status age is_child new_class_desc new_is_child
1 Married 35 0 kids_sec 1
2 Single 28 0 Other 0
3 Married 32 1 Other 1
5 Married 42 0 kids_sec 1
2 Single 28 1 Other 1
7 Single 27 0 kids_sec 0
我已经按照以下方式使用python-
condition = ~((df['marital_status'] == 'Married') &\
(df['new_class_desc'] == 'kids_sec') &\
(df['age'] >= 33))
# Creating the new column, duping your original is_child.
df['new_col'] = df.loc[:, 'is_child']
# Applying your condition using df.where.
df.loc[:, 'new_col'] = df.where(condition, 1)
print(df)
如何使用pyspark进行相同的操作?