我正在删除实际的列名,因为我不应该分享那些 但是她是错误的一瞥
AnalysisException: u"Except can only be performed on tables with the compatible column types.
string <> boolean at the 28th column of the second table;
;\n'Except false\n:- Filter (cast(inactive_date#111 as string) = '3001-01-01')\n:
+- Project [... 33 more fields]\n:+- Project [ ... 33 more fields]\n:+- SubqueryAlias \n:+-Relation[... 33 more fields] parquet\n
+- Project [... 33 more fields]\n +- Join Inner, (Key#275 = entry#26)\n:- Filter (cast(inactive_date#283 as string) = '3001-01-01')\n:
+- Project [... 33 more fields]\n:
+- Project [... 33 more fields]\n : +- SubqueryAlias +- Relation[,... 33 more fields] parquet\n
+- Deduplicate [entry#26]\n +- Project [entry#26]\n+- Project [... 13 more fields]\n
+- Project [... 13 more fields]\n +- SubqueryAlias +- Relation[] parquet\n"
我的代码如下
#old dataframe (consider it as History )
#daily dataframe ( Consider it as daily )
#Filtering the Active records based on condition
Active_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] == '3001-01-01')
Inactive_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] != '3001-01-01')
#Joining active old records with the matching active records in daily dataframe based on KeyColumnA
left = Active_old_filtered_records
right = Active_new_daily_dataframe.select("keyColumnA").distinct()
Matching_Active_daily_old_dataframe = left.join(right, ["keyColumnA"])
Non_matching_active_daily_old_dateframe = Active_old_filtered_records.**subtract**(Matching_Active_daily_old_dataframe)
注意:这里的每日数据框和旧的数据框具有完全相同的架构,但是我遇到了分析异常。有人可以在这方面提供帮助吗 谢谢。
答案 0 :(得分:0)
最后,我能够使用以下代码解决此问题
#old dataframe (consider it as History )
#daily dataframe ( Consider it as daily )
cols = Active_old_filtered_records.columns
#Filtering the Active records based on condition
Active_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] == '3001-01-01')
Inactive_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] != '3001-01-01')
#Joining active old records with the matching active records in daily dataframe based on KeyColumnA
left = Active_old_filtered_records
right = Active_new_daily_dataframe.select("keyColumnA").distinct()
Matching_Active_daily_old_dataframe = left.join(right, ["keyColumnA"]).select(cols)
Non_matching_active_daily_old_dateframe = Active_old_filtered_records.subtract(Matching_Active_daily_old_dataframe)
从起始位置以外的任何位置将两个数据框与一列连接在一起,将更改结果数据框中列的顺序。因此,请维护一个cols变量并以正确的顺序选择同一列,以确保结果步骤可以正常进行:D
最后我能够解决问题。