当我尝试保存一个没有显式路径的表时,hivemetastore会有一个虚假的路径"属性指向" / user / hive / warehouse"而不是" / hive / warehouse"。如果我明确使用.option("路径"," / hive /仓库")设置路径,那么一切正常但Hive会创建外部表。有没有办法将托管表保存到hive Metastore,而没有那个与hive中文件位置不匹配的伪造路径属性?
from pyspark.sql import SparkSession
spark = SparkSession.builder.master(master_url).enableHiveSupport().getOrCreate()
df = spark.range(100)
df.write.saveAsTable("test1")
df.write.option("path", "/hive/warehouse").saveAsTable("test2")
hive> describe formatted test1;
OK
# col_name data_type comment
id bigint
# Detailed Table Information
Database: default
Owner: root
CreateTime: Fri Mar 10 18:53:07 UTC 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: file:/hive/warehouse/test1
Table Type: MANAGED_TABLE
Table Parameters:
spark.sql.sources.provider parquet
spark.sql.sources.schema.numParts 1
spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}
transient_lastDdlTime 1489171987
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
path file:/user/hive/warehouse/test1
serialization.format 1
Time taken: 0.423 seconds, Fetched: 30 row(s)
hive> describe formatted test2;
OK
# col_name data_type comment
id bigint
# Detailed Table Information
Database: default
Owner: root
CreateTime: Fri Mar 10 16:02:07 UTC 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: file:/hive/warehouse/test2
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
numFiles 2
numRows -1
rawDataSize -1
spark.sql.sources.provider parquet
spark.sql.sources.schema.numParts 1
spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}
totalSize 4755
transient_lastDdlTime 1489161727
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
path file:/hive/warehouse/test2
serialization.format 1
Time taken: 0.402 seconds, Fetched: 36 row(s)
答案 0 :(得分:1)
修正了问题。对于那些有类似问题的人,我会发布我的修复程序。
将表保存到默认配置单元数据库时,只会出现“路径”参数不正确的问题(如下所示)。这让我觉得可能“旧”数据库使用旧配置值(hive.metastore.warehouse.dir),而新数据库使用新值。
因此,修复是删除默认数据库,重新创建数据库,现在在hive Metastore中创建的所有数据库都将使用正确的hive.metastore.warehouse.dir值。
spark.sql("create database testdb")
spark.sql("use testdb")
df.write.saveAsTable("test3")
hive> describe formatted test.test3;
OK
# col_name data_type comment
id bigint
# Detailed Table Information
Database: testdb
Owner: root
CreateTime: Fri Mar 10 22:10:10 UTC 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: file:/hive/warehouse/test.db/test3
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
numFiles 1
numRows -1
rawDataSize -1
spark.sql.sources.provider parquet
spark.sql.sources.schema.numParts 1
spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}
totalSize 409
transient_lastDdlTime 1489183810
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
path file:/hive/warehouse/test.db/test3
serialization.format 1
Time taken: 0.243 seconds, Fetched: 35 row(s)
答案 1 :(得分:0)
<强> hive.metastore.warehouse.dir 强>
- 默认值:/ user / hive / warehouse
添加:Hive 0.2.0
仓库的默认数据库的位置。
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties