Hive JSON Serde MetaStore问题

时间:2015-11-25 17:00:52

标签: hive hiveql

我有一个带有JSON数据的外部表,我使用JsonSerde将数据填充到表中。我正在填充数据,当我查询数据时,我能够正确地看到结果。

但是,当我在该表上使用desc命令时,我收到所有列注释的from deserializer文本。

下面是表格创建ddl。

  CREATE EXTERNAL TABLE IF NOT EXISTS my_table (
    field1 string COMMENT 'This is a field1', 
    field2 int COMMENT 'This is a field2', 
    field3 string COMMENT 'This is a field3', 
    field4 double COMMENT 'This is a field4'
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
    Location '/user/uszszb6/json_test/data';

数据文件中的条目。

{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}
{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}
{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}
{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}

当我使用命令desc my_table时,我得到以下输出。

 +-----------+------------+--------------------+--+
    | col_name  | data_type  |      comment       |
    +-----------+------------+--------------------+--+
    | field1    | string     | from deserializer  |
    | field2    | int        | from deserializer  |
    | field3    | string     | from deserializer  |
    | field4    | double     | from deserializer  |
    +-----------+------------+--------------------+--+

JsonSerde无法正确捕获评论。我也尝试过像

这样的其他JSONSerde
 org.openx.data.jsonserde.JsonSerDe
 org.apache.hive.hcatalog.data.JsonSerDe
 com.amazon.elasticmapreduce.JsonSerde

但是desc命令输出是一样的。这个错误[https://issues.apache.org/jira/browse/HIVE-6681][1]

有一张JIRA票证

根据票证,它在版本0.13中得到解决,我使用的是hive 1.2.1,但我仍然面临着这个问题。

任何人都可以分享您对解决此问题的看法。

1 个答案:

答案 0 :(得分:0)

是的,它看起来像是一个影响所有Json SerDes的蜂巢错误,但您尝试过使用DESCRIBE EXTENDED吗?

DESCRIBE EXTENDED my_table;

hive> describe extended  json_serde_test;
OK
browser                 string                  from deserializer   
device_uuid             string                  from deserializer   
custom                  struct<customer_id:string>  from deserializer   

Detailed Table Information  
Table(tableName:json_serde_test,dbName:default, owner:rcongiu,
createTime:1448477902, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:browser, type:string, 
comment:hello), FieldSchema(name:device_uuid, type:string, comment:my 
name is elder price), FieldSchema(name:custom,   
type:struct<customer_id:string>, comment:null)], 
location:hdfs://localhost:9000/user/hive/warehouse/json_serde_test, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat,  
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.openx.data.jsonserde.JsonSerDe, parameters:
{serialization.format=1, mapping.customer_id=Customer ID}), 
bucketCols:[], sortCols:[], parameters:{}, 
skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{numFiles=1, 
transient_lastDdlTime=1448477903, COLUMN_STATS_ACCURATE=true, 
totalSize=128, numRows=0, rawDataSize=0}, viewOriginalText:null,     
viewExpandedText:null, tableType:MANAGED_TABLE) 
Time taken: 0.073 seconds, Fetched: 5 row(s)

将输出json-ish详细说明,其中包括评论......难以阅读的内容,但它显示了我的评论,可能足以满足您的目的......或者不是。