我有一个pyspark.sql.DataFrame.dataframe df
id col1
1 abc
2 bcd
3 lal
4 bac
我想在df中再添加一列标志,这样如果id为奇数,则标志应为' odd' ,即使是偶数'
最终输出应为
id col1 flag
1 abc odd
2 bcd even
3 lal odd
4 bac even
我试过了:
def myfunc(num):
if num % 2 == 0:
flag = 'EVEN'
else:
flag = 'ODD'
return flag
df['new_col'] = df['id'].map(lambda x: myfunc(x))
df['new_col'] = df['id'].apply(lambda x: myfunc(x))
它给了我错误:TypeError: 'Column' object is not callable
如何在pyspark中使用.apply(我在pandas dataframe中使用)
答案 0 :(得分:1)
pyspark
不提供申请,另一种方法是使用withColumn
功能。使用withColumn
执行此操作。
from pyspark.sql import functions as F
df = sqlContext.createDataFrame([
[1,"abc"],
[2,"bcd"],
[3,"lal"],
[4,"bac"]
],
["id","col1"]
)
df.show()
+---+----+
| id|col1|
+---+----+
| 1| abc|
| 2| bcd|
| 3| lal|
| 4| bac|
+---+----+
df.withColumn(
"flag",
F.when(F.col("id")%2 == 0, F.lit("Even")).otherwise(
F.lit("odd"))
).show()
+---+----+----+
| id|col1|flag|
+---+----+----+
| 1| abc| odd|
| 2| bcd|Even|
| 3| lal| odd|
| 4| bac|Even|
+---+----+----+