我想基于“ nb_pred_x”和“ svm_pred_x”中的两个值添加一个名为“ joint_pred_x”(x = 0,1,2)的列,如果nb = 1,svm = 1,则添加0;如果nb = 1,svm = 0则加1;如果nb = 0,svm = 1则加2;如果nb = 0,svm = 0,则加3。 我认为withcolumn可以完成这项工作,但我对条件逻辑感到困惑。预先感谢,解决方案只需是pyspark
答案 0 :(得分:0)
您可以使用case
语句。
+---------+---------+---------+----------+----------+----------+
|nb_pred_0|nb_pred_1|nb_pred_2|svm_pred_0|svm_pred_1|svm_pred_2|
+---------+---------+---------+----------+----------+----------+
|0.0 |1.0 |0.0 |0.0 |1.0 |0.0 |
+---------+---------+---------+----------+----------+----------+
from pyspark.sql.functions import expr
for i in range(0, 3):
index = str(i)
df = df.withColumn('joint_pred_' + index, expr(f'''
CASE
WHEN {p1}_pred_{index} == 1 and {p2}_pred_{index} == 1 THEN 0
WHEN {p1}_pred_{index} == 1 and {p2}_pred_{index} == 0 THEN 1
WHEN {p1}_pred_{index} == 0 and {p2}_pred_{index} == 1 THEN 2
WHEN {p1}_pred_{index} == 0 and {p2}_pred_{index} == 0 THEN 3
END
'''))
df.show(10, False)
+---------+---------+---------+----------+----------+----------+------------+------------+------------+
|nb_pred_0|nb_pred_1|nb_pred_2|svm_pred_0|svm_pred_1|svm_pred_2|joint_pred_0|joint_pred_1|joint_pred_2|
+---------+---------+---------+----------+----------+----------+------------+------------+------------+
|0.0 |1.0 |0.0 |0.0 |1.0 |0.0 |3 |0 |3 |
+---------+---------+---------+----------+----------+----------+------------+------------+------------+