Need to add new columns with constant values while joining two tables
using pyspark. Using lit isn't solving the issue in Pyspark.
\\\**** Table A *******\\\\\\\
There are two tables A , B Table A as follows
ID Day Name Description
1 2016-09-01 Sam Retail
2 2016-01-28 Chris Retail
3 2016-02-06 ChrisTY Retail
4 2016-02-26 Christa Retail
3 2016-12-06 ChrisTu Retail
4 2016-12-31 Christi Retail
\\\**** Table B *****\\\\\\\
Table B
ID SkEY
1 1.1
2 1.2
3 1.3
from pyspark.sql import sparksession
from pyspark.sql import functions as F
from pyspark.sql.functions import lit
from pyspark import HiveContext
hiveContext= HiveContext(sc)
ABC2 = spark.sql(
"select * From A where day ='{0}'".format(i[0])
)
Join = ABC2.join(
Tab2,
(
ABC2.ID == Tab2.ID
)
).select(
Tab2.skey,
ABC2.Day,
ABC2.Name,
ABC2.withColumn('newcol1, lit('')),
ABC2.withColumn('newcol2, lit('A')),
ABC2.Description
)
Join.select(
"skey",
"Day",
"Name",
"newcol1",
"newcol2",
"Description"
).write.mode("append").format("parquet").insertinto("Table")
ABC=spark.sql(
"select distinct day from A where day= '2016-01-01' "
)
即使定义新的列,上面的代码也会导致问题
和常亮的值,newcol1也需要取空值,而newcol2
为A
新表应以与列相同的顺序加载以下列 呈现,并带有恒定值的新列
答案 0 :(得分:2)
将Join DF重写为:
Join = ABC2.join(Tab2,(ABC2.ID == Tab2.ID))\ .select(Tab2.skey,ABC2.Day,ABC2.Name,)\ .withColumn('newcol1',lit(“”))\ .withColumn('newcol2',lit(“ A”))
答案 1 :(得分:0)
您可以加入。按您喜欢的顺序选择,这样您的代码将如下所示:
Join = ABC2.join(Tab2,(ABC2.ID == Tab2.ID))\ .select(Tab2.skey,ABC2.Day,ABC2.Name,ABC2.Description)\ .withColumn('newcol1', lit(“”))\ .withColumn('newcol2',lit(“ A”))
加入。选择( “钥匙”, “天”, “名称”, “ newcol1”, “ newcol2”, “描述” ).write.mode(“ append”)。format(“ parquet”)。insertinto(“ Table”)