应用错误收集

我需要在应用映射条件后创建最终数据框。对于前如果我有如下所示的最终数据框

df.show()
+---+---+
|a  |b  |
+---+---+
|c  |2  |
+---+---+
printSchema:
  a: string (nullable - true)
  b: integer (nullable - true)

我必须将具有相同列但具有不同架构的最终表加载到Hive表中，其中某些列值不接受空值。例如，如果在上述数据框中，如果列“ a”具有任何空值，则不应更新配置单元表中的特定行。我正在使用以下命令写入表-

df.write.mode(append).format(parquet).saveAsTable(table_name)

因此，在进行表追加之前，我是否应该更改架构？

schema = StructType([StructField("a", StringType, False), ("b", IntegerType(), True)])
df_updated = spark.createDataFrame(df.rdd, schema)