CREATE EXTERNAL TABLE invoiceitems (
InvoiceNo INT,
StockCode INT,
Description STRING,
Quantity INT,
InvoiceDate BIGINT,
UnitPrice DOUBLE,
CustomerID INT,
Country STRING,
LineNo INT,
InvoiceTime STRING,
StoreID INT,
TransactionID STRING
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3a://streamingdata/data/*';
数据文件是由Spark结构化的流作业创建的:
...
data/part-00000-006fc42a-c6a1-42a2-af03-ae0c326b40bd-c000.json 7.1 KB 29/08/2018 10:27:32 PM
data/part-00000-0075634b-8513-47b3-b5f8-19df8269cf9d-c000.json 1.3 KB 30/08/2018 10:47:32 AM
data/part-00000-00b6b230-8bb3-49d1-a42e-ad768c1f9a94-c000.json 2.3 KB 30/08/2018 1:25:02 AM
...
这是第一个文件的前几行:
{"InvoiceNo":5421462,"StockCode":22426,"Description":"ENAMEL WASH BOWL CREAM","Quantity":8,"InvoiceDate":1535578020000,"UnitPrice":3.75,"CustomerID":13405,"Country":"United Kingdom","LineNo":6,"InvoiceTime":"21:27:00","StoreID":0,"TransactionID":"542146260180829"}
{"InvoiceNo":5501932,"StockCode":22170,"Description":"PICTURE FRAME WOOD TRIPLE PORTRAIT","Quantity":4,"InvoiceDate":1535578020000,"UnitPrice":6.75,"CustomerID":13952,"Country":"United Kingdom","LineNo":26,"InvoiceTime":"21:27:00","StoreID":0,"TransactionID":"5501932260180829"}
但是,如果我运行查询,则不会返回任何数据:
hive> select * from invoiceitems limit 5;
OK
Time taken: 24.127 seconds
配置单元的日志文件为空:
$ ls /var/log/hive*
/var/log/hive:
/var/log/hive-hcatalog:
/var/log/hive2:
如何进一步调试?
答案 0 :(得分:0)
我在跑步时收到有关该错误的更多提示:
select count(*) from invoiceitems;
这返回了以下错误
...
由于VERTEX_FAILURE,DAG无法成功。 Vertices:1 KilledVertices:1失败:执行错误,返回代码2 org.apache.hadoop.hive.ql.exec.tez.TezTask。顶点失败, vertexName =地图1,vertexId = vertex_1535521291031_0011_1_00, diagnostics = [顶点vertex_1535521291031_0011_1_00 [地图1] 终止/失败是由于:ROOT_INPUT_INIT_FAILURE,顶点输入: 发票项目初始化程序失败,顶点= vertex_1535521291031_0011_1_00 [地图1],java.io.IOException:找不到dir = s3a://streamingdata/data/part-00000-006fc42a-c6a1-42a2-af03-ae0c326b40bd-c000.json 在pathToPartitionInfo中:[s3a:// streamingdata / data / *]
我决定从以下位置更改创建表定义:
LOCATION 's3a://streamingdata/data/*';
到
LOCATION 's3a://streamingdata/data/';
这解决了问题。