我正在尝试这样做,我得到一个很长的错误:
df=df.withColumn('NewColumnName', someother_df['Time'])
它不起作用。 这样做:
df=df.withColumn('NewColumnName', someother_df.select('Time'))
给我这个错误:AssertionError:col应该是Column
答案 0 :(得分:2)
您似乎将两个数据框合并而没有任何常用密钥,因此下面的代码应该适合您。
import pyspark.sql.functions as func
df1 = sc.parallelize([('1234','13'),('6789','68')]).toDF(['col1','col2'])
df2 = sc.parallelize([('7777','66'),('8888','22')]).toDF(['col3','col4'])
# since there are no common column between these two dataframes add row_index so that it can be joined
df1=df1.withColumn('row_index', func.monotonically_increasing_id())
df2=df2.withColumn('row_index', func.monotonically_increasing_id())
# 'col3' from second dataframe (i.e. df2) is added to first dataframe (i.e. df1)
df1 = df1.join(df2["row_index","col3"], on=["row_index"]).drop("row_index")
df1.show()
如果它解决了您的问题,请不要告诉我们。)