我的数据框为
import pandas as pd
ndf = pd.DataFrame({'a':[False, False,True,True,False], 'b':[False, False,False,False, True]})
ndf_s = sqlContext.createDataFrame(ndf)
我想获得一个名为“action”的新列。这可能包含两个值,如果ndf ['a']为True,则“action”的值为“我是a”,如果ndf ['b']为True,则“action”的值为“I is b” 。否则获得值无。如果两列都为真,则返回值为“我是a和b”。换句话说,我想得到一个DataFrame:
ndf_result = sqlContext.createDataFrame(pd.DataFrame({'a':[False, False,True,True,False], 'b':[False, False,False,False, True], 'action':[None, None, 'I am a', 'I am a', 'I am b']}))
答案 0 :(得分:3)
您可以使用when.otherwise
:
import pyspark.sql.functions as F
ndf_s.withColumn("action", F.when(
ndf_s["a"] & ndf_s["b"], "I am a and b"
).otherwise(
F.when(
ndf_s["a"], "I am a"
).otherwise(
F.when(ndf_s["b"], "I am b")
)
)
).show()
+-----+-----+------------+
| a| b| action|
+-----+-----+------------+
| true| true|I am a and b|
|false|false| null|
| true|false| I am a|
| true|false| I am a|
|false| true| I am b|
+-----+-----+------------+
udf
的另一个选项:
import pyspark.sql.functions as F
@F.udf
def action(col_a, col_b):
if col_a and col_b:
return "I am a and b"
elif col_a:
return "I am a"
elif col_b:
return "I am b"
ndf_s.withColumn("action", action(ndf_s["a"], ndf_s["b"])).show()
+-----+-----+------------+
| a| b| action|
+-----+-----+------------+
| true| true|I am a and b|
|false|false| null|
| true|false| I am a|
| true|false| I am a|
|false| true| I am b|
+-----+-----+------------+
答案 1 :(得分:0)
import pyspark.sql.functions as udf
import pandas as pd
ndf = pd.DataFrame({'a':[False, False,True,True,False], 'b':[False, False,False,False, True]})
ndf_s = sqlContext.createDataFrame(ndf)
def get_expected_string(a,b):
if a and b:
return "I am a and b"
elif a:
return "I am a"
elif b:
return "I am b"
else: return None
# defining udf function for get_expected_string
get_expected_string_udf = udf(get_expected_string, StringType())
ndf_s = ndf_s.withColumn("action",get_expected_string_udf("a","b"))
ndf_s.show()
+-----+-----+------------+
| a| b| action|
+-----+-----+------------+
| true| true|I am a and b|
|false|false| null|
| true|false| I am a|
| true|false| I am a|
|false| true| I am b|
+-----+-----+------------+