我有两个dataframe-df1和df2。 df1由70行和7列组成,而df2由80行和7列组成。
如何仅从df2中获取相对于df1在任何列中具有任何新记录值的记录,即pyspark-2.2.0中的df1中不存在该记录值?
我尝试使用此左联接查询方法,但无法在sqlContext.sql()中执行此操作。
sqlContext.sql(
select df2.*,df1.* from df2
left join (select * from df1)
on (df2.col1=df1.col1
AND df2.col2=df1.col2
AND df2.col3 =df1.col3
AND df2.col4=df1.col4
AND df2.col5=df1.col5
AND df2.col6=df1.col6
AND df2.col7=df1.col7)
where df1.col1 is null
AND df1.col2 is null
AND df1.col3 is null
AND df1.col4 is null
AND df1.col5 is null
AND df1.col6 is null
AND df1.col7 is null).show()
答案 0 :(得分:0)
使用数据框方法减去[1]。示例:
*#!/usr/bin/expect -f*
cd /home/test/project
git pull
expect "sername"
send your_username
send "\r"
expect "assword"
send {your_password}
send "\r"
interact