我正在尝试将DF写入hive:
df_block_identity.printSchema()
root
|-- HUB_ID: long (nullable = false)
|-- ClientId: string (nullable = true)
|-- publicID: string (nullable = true)
|-- CreationAppSource: string (nullable = true)
|-- LastUpdateAppSource: string (nullable = true)
|-- FirstName: string (nullable = true)
|-- LastName: string (nullable = true)
|-- Email: string (nullable = true)
|-- publicID_address: string (nullable = true)
|-- CreationAppSource_address: string (nullable = true)
|-- LastUpdateAppSource_address: string (nullable = true)
|-- AddressNameDesc: string (nullable = true)
|-- AddressObjective: string (nullable = true)
|-- AddressQuality: string (nullable = true)
|-- City: string (nullable = true)
|-- Country: string (nullable = true)
|-- ExtraData: string (nullable = true)
|-- Region: string (nullable = true)
|-- Street1: string (nullable = true)
|-- Street2: string (nullable = true)
|-- Street3: string (nullable = true)
|-- Street4: string (nullable = true)
|-- ZipCode: string (nullable = true)
|-- IsPrimaryAddress: string (nullable = true)
|-- ExternalAddressID: string (nullable = true)
|-- publicID_MOBILE: string (nullable = true)
|-- CreationAppSource_MOBILE: string (nullable = true)
|-- LastUpdateAppSource_MOBILE: string (nullable = true)
|-- MOBILE: string (nullable = true)
|-- publicID_FIXE: string (nullable = true)
|-- CreationAppSource_FIXE: string (nullable = true)
|-- LastUpdateAppSource_FIXE: string (nullable = true)
|-- FIXE: string (nullable = true)
|-- service: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- publicID_Services: string (nullable = true)
| | |-- CreationAppSource_Services: string (nullable = true)
| | |-- LastUpdateAppSource_Services: string (nullable = true)
| | |-- ServiceTypeId: string (nullable = true)
| | |-- ServiceId: string (nullable = true)
| | |-- ServiceStatus: boolean (nullable = true)
| | |-- ActivationDate: timestamp (nullable = true)
| | |-- DeactivationDate: timestamp (nullable = true)
|-- publicID_Title: string (nullable = true)
|-- CreationAppSource_Title: string (nullable = true)
|-- LastUpdateAppSource_Title: string (nullable = true)
|-- Title: string (nullable = true)
|-- publicID_Civility: string (nullable = true)
|-- CreationAppSource_Civility: string (nullable = true)
|-- LastUpdateAppSource_Civility: string (nullable = true)
|-- Civility: string (nullable = true)
|-- publicID_Gender: string (nullable = true)
|-- CreationAppSource_Gender: string (nullable = true)
|-- LastUpdateAppSource_Gender: string (nullable = true)
|-- Gender: string (nullable = true)
|-- publicID_MaritalStatus: string (nullable = true)
|-- CreationAppSource_MaritalStatus: string (nullable = true)
|-- LastUpdateAppSource_MaritalStatus: string (nullable = true)
|-- MaritalStatus: string (nullable = true)
|-- publicID_BirthDate: string (nullable = true)
|-- CreationAppSource_BirthDate: string (nullable = true)
|-- LastUpdateAppSource_BirthDate: string (nullable = true)
|-- BirthDate: date (nullable = true)
|-- publicID_CSP: string (nullable = true)
|-- CreationAppSource_CSP: string (nullable = true)
|-- LastUpdateAppSource_CSP: string (nullable = true)
|-- CSP: string (nullable = true)
|-- publicID_NbChildren: string (nullable = true)
|-- CreationAppSource_NbChildren: string (nullable = true)
|-- LastUpdateAppSource_NbChildren: string (nullable = true)
|-- NbChildren: string (nullable = true)
|-- publicID_PMR: string (nullable = true)
|-- CreationAppSource_PMR: string (nullable = true)
|-- LastUpdateAppSource_PMR: string (nullable = true)
|-- PMR: string (nullable = true)
|-- publicID_DegreeDisability: string (nullable = true)
|-- CreationAppSource_DegreeDisability: string (nullable = true)
|-- LastUpdateAppSource_DegreeDisability: string (nullable = true)
|-- DegreeDisability: string (nullable = true)
|-- publicID_CompanyName: string (nullable = true)
|-- CreationAppSource_CompanyName: string (nullable = true)
|-- LastUpdateAppSource_CompanyName: string (nullable = true)
|-- CompanyName: string (nullable = true)
|-- publicID_LanguageId: string (nullable = true)
|-- CreationAppSource_LanguageId: string (nullable = true)
|-- LastUpdateAppSource_LanguageId: string (nullable = true)
|-- LanguageId: string (nullable = true)
|-- publicID_NationalityId: string (nullable = true)
|-- CreationAppSource_NationalityId: string (nullable = true)
|-- LastUpdateAppSource_NationalityId: string (nullable = true)
|-- NationalityId: string (nullable = true)
此架构后的示例数据:
AHA d4cd8d01-6a4f-446c-838e-ded98c1e8d53 TOTO TOTO NULL . xxx@gmail.com NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL [{"publicID_Services":"d4cd8d01-6a4f-446c-838e-ded98c1e8d53","CreationAppSource_Services":"TOTO","LastUpdateAppSource_Services":"TOTO","ServiceTypeId":"OPTINS","ServiceId":"PARTENAIRES","ServiceStatus":true,"ActivationDate":"2015-09-18 00:00:00","DeactivationDate":"9999-12-31 23:59:59.999"},{"publicID_Services":"d4cd8d01-6a4f-446c-838e-ded98c1e8d53","CreationAppSource_Services":"TOTO","LastUpdateAppSource_Services":"TOTO","ServiceTypeId":"OPTINS","ServiceId":"NEWSLETTER","ServiceStatus":true,"ActivationDate":"2015-09-18 00:00:00","DeactivationDate":"9999-12-31 23:59:59.999"}] NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
我使用命令:df_block_identity.write.saveAsTable('sb_party_hub_dev.golden', mode='overwrite', format="parquet")
此命令完成正常。我可以在Hive Metastore看到这张桌子。
但是当我尝试使用select * from sb_party_hub_dev.golden
从hive请求时,我收到错误:
java.io.IOException:org.apache.parquet.io.ParquetDecodingException: 无法读取文件中块-1中0的值 ADL://home/hive/warehouse/sb_party_hub_dev.db/golden/part-r-00000-e3dcac27-021e-43e8-8687-01ae305d5b5d.snappy.parquet
当我删除作为数组类型的字段service
时,select
将检索表的内容。
在PySpark代码中,我应该更改哪些内容,以便在Hive中编写表格并能够无错误地查询它?
编辑:
我尝试了另一种格式:
df_block_identity.write.saveAsTable('sb_party_hub_dev.golden', mode='overwrite', format="orc")
使用这种格式,我可以通过HIVE访问我的数据。那为什么镶木地板会出现问题?