加入时在表格之间添加clolumn

时间:2019-06-04 12:19:44

标签: pyspark

 Need to add new columns with constant values while joining two tables    
 using  pyspark. Using lit isn't solving the issue in Pyspark.

\\\**** Table A *******\\\\\\\

There are two tables A , B Table A as follows
 ID Day Name Description
 1   2016-09-01  Sam   Retail
 2   2016-01-28  Chris Retail
 3   2016-02-06  ChrisTY Retail
 4   2016-02-26  Christa Retail
 3   2016-12-06  ChrisTu Retail
 4   2016-12-31  Christi Retail

\\\**** Table B *****\\\\\\\
 Table B

ID SkEY
1  1.1
2  1.2
3  1.3

 from pyspark.sql import sparksession
 from pyspark.sql import functions as F
 from pyspark.sql.functions import lit
 from pyspark import HiveContext
 hiveContext= HiveContext(sc)


 ABC2 = spark.sql(
"select * From A where day ='{0}'".format(i[0])
 )
Join = ABC2.join(
 Tab2,
 (
    ABC2.ID == Tab2.ID
)
).select(
Tab2.skey,
ABC2.Day,
ABC2.Name,
ABC2.withColumn('newcol1, lit('')),
ABC2.withColumn('newcol2, lit('A')),
ABC2.Description
)
Join.select(
"skey",
"Day",
"Name",
"newcol1",
"newcol2",
"Description"
 ).write.mode("append").format("parquet").insertinto("Table")

ABC=spark.sql(
"select distinct day from A where day= '2016-01-01' "
)

即使定义新的列,上面的代码也会导致问题
 和常亮的值,newcol1也需要取空值,而newcol2  为A

新表应以与列相同的顺序加载以下列      呈现,并带有恒定值的新列

2 个答案:

答案 0 :(得分:2)

将Join DF重写为:

Join = ABC2.join(Tab2,(ABC2.ID == Tab2.ID))\ .select(Tab2.skey,ABC2.Day,ABC2.Name,)\ .withColumn('newcol1',lit(“”))\ .withColumn('newcol2',lit(“ A”))

答案 1 :(得分:0)

您可以加入。按您喜欢的顺序选择,这样您的代码将如下所示:

Join = ABC2.join(Tab2,(ABC2.ID == Tab2.ID))\ .select(Tab2.skey,ABC2.Day,ABC2.Name,ABC2.Description)\ .withColumn('newcol1', lit(“”))\ .withColumn('newcol2',lit(“ A”))

加入。选择( “钥匙”, “天”, “名称”, “ newcol1”, “ newcol2”, “描述”  ).write.mode(“ append”)。format(“ parquet”)。insertinto(“ Table”)