这里我试图将数据帧保存到分区的配置单元表中并获得这个愚蠢的异常。我多次看过它但却找不到错误。
org.apache.spark.sql.AnalysisException:指定的分区列 (timestamp value)与表的分区列不匹配。 请使用()作为分区列。;
以下是使用
创建外部表的脚本CREATE EXTERNAL TABLEIF NOT EXISTS events2 (
action string
,device_os_ver string
,device_type string
,event_name string
,item_name string
,lat DOUBLE
,lon DOUBLE
,memberid BIGINT
,productupccd BIGINT
,tenantid BIGINT
) partitioned BY (timestamp_val DATE)
row format serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
stored AS inputformat 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
location 'maprfs:///location/of/events2'
tblproperties ('serialization.null.format' = '');
以下是格式化表格“ events2 ”
的结果hive> describe formatted events2;
OK
# col_name data_type comment
action string
device_os_ver string
device_type string
event_name string
item_name string
lat double
lon double
memberid bigint
productupccd bigint
tenantid bigint
# Partition Information
# col_name data_type comment
timestamp_val date
# Detailed Table Information
Database: default
CreateTime: Wed Jan 11 16:58:55 IST 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: maprfs:/location/of/events2
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
serialization.null.format
transient_lastDdlTime 1484134135
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.078 seconds, Fetched: 42 row(s)
以下是将数据分区并存储到表中的代码行,
val tablepath = Map("path" -> "maprfs:///location/of/events2")
AppendDF.write.format("parquet").partitionBy("Timestamp_val").options(tablepath).mode(org.apache.spark.sql.SaveMode.Append).saveAsTable("events2")
在运行应用程序时,我得到以下
指定的分区列(timestamp_val)与分区不匹配 table的列。请使用()作为分区列。
我可能会犯一个明显的错误,任何帮助都非常赞赏upvote:)
答案 0 :(得分:0)
请打印df的图式:
AppendDF.printSchema()
确保它不是类型不匹配??