AnalysisException:u“只能对具有兼容列类型的表执行例外

时间:2019-03-17 12:55:24

标签: apache-spark dataframe pyspark apache-spark-sql pyspark-sql

我正在删除实际的列名,因为我不应该分享那些 但是她是错误的一瞥

AnalysisException: u"Except can only be performed on tables with the compatible column types. 
string <> boolean at the 28th column of the second table;
;\n'Except false\n:- Filter (cast(inactive_date#111 as string) = '3001-01-01')\n:  
+- Project [... 33 more fields]\n:+- Project [ ... 33 more fields]\n:+- SubqueryAlias \n:+-Relation[... 33 more fields] parquet\n
+- Project [... 33 more fields]\n +- Join Inner, (Key#275 = entry#26)\n:- Filter (cast(inactive_date#283 as string) = '3001-01-01')\n:  
+- Project [... 33 more fields]\n:  
+- Project [... 33 more fields]\n : +- SubqueryAlias  +- Relation[,... 33 more fields] parquet\n      
+- Deduplicate [entry#26]\n +- Project [entry#26]\n+- Project [... 13 more fields]\n              
+- Project [... 13 more fields]\n  +- SubqueryAlias +- Relation[] parquet\n"

我的代码如下

#old dataframe   (consider it as History )
#daily dataframe ( Consider it as daily  )

#Filtering the Active records based on condition

Active_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] == '3001-01-01')
Inactive_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] != '3001-01-01')

#Joining active old records with the matching active records in daily dataframe based on KeyColumnA 

left = Active_old_filtered_records
right = Active_new_daily_dataframe.select("keyColumnA").distinct()

Matching_Active_daily_old_dataframe = left.join(right, ["keyColumnA"])
Non_matching_active_daily_old_dateframe = Active_old_filtered_records.**subtract**(Matching_Active_daily_old_dataframe)

注意:这里的每日数据框和旧的数据框具有完全相同的架构,但是我遇到了分析异常。有人可以在这方面提供帮助吗 谢谢。

1 个答案:

答案 0 :(得分:0)

最后,我能够使用以下代码解决此问题

#old dataframe   (consider it as History )
#daily dataframe ( Consider it as daily  )

cols = Active_old_filtered_records.columns

#Filtering the Active records based on condition

Active_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] == '3001-01-01')
Inactive_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] != '3001-01-01')

#Joining active old records with the matching active records in daily dataframe based on KeyColumnA 

left = Active_old_filtered_records
right = Active_new_daily_dataframe.select("keyColumnA").distinct()

Matching_Active_daily_old_dataframe = left.join(right, ["keyColumnA"]).select(cols)

Non_matching_active_daily_old_dateframe = Active_old_filtered_records.subtract(Matching_Active_daily_old_dataframe)

从起始位置以外的任何位置将两个数据框与一列连接在一起,将更改结果数据框中列的顺序。因此,请维护一个cols变量并以正确的顺序选择同一列,以确保结果步骤可以正常进行:D

最后我能够解决问题。