我有一个带有JSON数据的外部表,我使用JsonSerde将数据填充到表中。我正在填充数据,当我查询数据时,我能够正确地看到结果。
但是,当我在该表上使用desc
命令时,我收到所有列注释的from deserializer
文本。
下面是表格创建ddl。
CREATE EXTERNAL TABLE IF NOT EXISTS my_table (
field1 string COMMENT 'This is a field1',
field2 int COMMENT 'This is a field2',
field3 string COMMENT 'This is a field3',
field4 double COMMENT 'This is a field4'
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
Location '/user/uszszb6/json_test/data';
数据文件中的条目。
{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}
{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}
{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}
{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}
当我使用命令desc my_table时,我得到以下输出。
+-----------+------------+--------------------+--+
| col_name | data_type | comment |
+-----------+------------+--------------------+--+
| field1 | string | from deserializer |
| field2 | int | from deserializer |
| field3 | string | from deserializer |
| field4 | double | from deserializer |
+-----------+------------+--------------------+--+
JsonSerde无法正确捕获评论。我也尝试过像
这样的其他JSONSerde org.openx.data.jsonserde.JsonSerDe
org.apache.hive.hcatalog.data.JsonSerDe
com.amazon.elasticmapreduce.JsonSerde
但是desc命令输出是一样的。这个错误[https://issues.apache.org/jira/browse/HIVE-6681][1]
根据票证,它在版本0.13中得到解决,我使用的是hive 1.2.1,但我仍然面临着这个问题。
任何人都可以分享您对解决此问题的看法。
答案 0 :(得分:0)
是的,它看起来像是一个影响所有Json SerDes的蜂巢错误,但您尝试过使用DESCRIBE EXTENDED吗?
DESCRIBE EXTENDED my_table;
hive> describe extended json_serde_test;
OK
browser string from deserializer
device_uuid string from deserializer
custom struct<customer_id:string> from deserializer
Detailed Table Information
Table(tableName:json_serde_test,dbName:default, owner:rcongiu,
createTime:1448477902, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:browser, type:string,
comment:hello), FieldSchema(name:device_uuid, type:string, comment:my
name is elder price), FieldSchema(name:custom,
type:struct<customer_id:string>, comment:null)],
location:hdfs://localhost:9000/user/hive/warehouse/json_serde_test,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.openx.data.jsonserde.JsonSerDe, parameters:
{serialization.format=1, mapping.customer_id=Customer ID}),
bucketCols:[], sortCols:[], parameters:{},
skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[],
skewedColValueLocationMaps:{}), storedAsSubDirectories:false),
partitionKeys:[], parameters:{numFiles=1,
transient_lastDdlTime=1448477903, COLUMN_STATS_ACCURATE=true,
totalSize=128, numRows=0, rawDataSize=0}, viewOriginalText:null,
viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.073 seconds, Fetched: 5 row(s)
将输出json-ish详细说明,其中包括评论......难以阅读的内容,但它显示了我的评论,可能足以满足您的目的......或者不是。