在Hive中使用Regex Serde创建表会返回错误

时间:2019-04-22 11:03:06

标签: regex hive hiveql hive-serde

我在Hive中使用Regex Serde创建了一个表。在Hue中,返回表创建成功。但是,当我尝试返回表SELECT * FROM pricefile_edited或以色相查看表时,它不起作用,并且出现Error。

数据为130个字符(每行),没有分隔符。

有人知道什么是问题所在,并提供帮助吗?谢谢

CREATE EXTERNAL TABLE pricefile_edited(
field1 STRING,
field2 STRING,
field3 STRING,
field4 STRING,
field5 STRING,
field6 STRING, 
field7 STRING,
field8 STRING,
field9 STRING,
field10 STRING,
field11 STRING,
field12 STRING,
field13 STRING,
field14 STRING,
field15 STRING,
field16 STRING,
field17 STRING,
field18 STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
 WITH SERDEPROPERTIES ("input.regex" = 
"(\\.{12})(\\.{1})(\\.{1})(\\.{24})(\\.{6})(\\.{6})(\\.{13})(\\.{6})(\\.{1})(\\.{4})(\\.{1})(\\.{3})(\\.{17})(\\.{9})(\\.{12})(\\.{1})(\\.{1})(\\.
{12})")
LOCATION '/user/hive/warehouse';

我收到此错误:

  

请求TFetchResultsReq(fetchType = 0,   operationHandle = TOperationHandle(hasResultSet = True,   modifiedRowCount = None,operationType = 0,   operationId = THandleIdentifier(secret ='\ xc3 \ xd7 \ x97 \ xd3coB \ xa1 \ x90P \ x9e \ xab \ x82 \ xa4 \ xf4A',   guid ='\ x80 \ xa1 \ x93 \ xe2 \ x10 \ xefJ \ xd9 \ xa3 \ xa3 \ xdb \ x1f \ x95 \ x85 \ x88 \ xb3')),   方向= 4,最大行= 100):   TFetchResultsResp(status = TStatus(errorCode = 0,   errorMessage ='java.io.IOException:java.io.IOException:不是文件:   hdfs://quickstart.cloudera:8020 / user / hive / warehouse / categories',   sqlState = None,   infoMessages = ['* org.apache.hive.service.cli.HiveSQLException:java.io.IOException:   java.io.IOException:不是文件:   hdfs://quickstart.cloudera:8020 / user / hive / warehouse / categories:25:24',   'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:463',   'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:294',   'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:769',   'sun.reflect.GeneratedMethodAccessor20:invoke ::-1',   'sun.reflect.DelegatingMethodAccessorImpl:调用:DelegatingMethodAccessorImpl.java:43',   'java.lang.reflect.Method:invoke:Method.java:498',   'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78',   'org.apache.hive.service.cli.session.HiveSessionProxy:access $ 000:HiveSessionProxy.java:36',   'org.apache.hive.service.cli.session.HiveSessionProxy $ 1:run:HiveSessionProxy.java:63',   'java.security.AccessController:doPrivileged:AccessController.java:-2',   'javax.security.auth.Subject:doAs:Subject.java:422',   'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1917',   'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59',   'com.sun.proxy。$ Proxy21:fetchResults ::-1',   'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:462',   'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:694',   'org.apache.hive.service.cli.thrift.TCLIService $ Processor $ FetchResults:getResult:TCLIService.java:1553',   'org.apache.hive.service.cli.thrift.TCLIService $ Processor $ FetchResults:getResult:TCLIService.java:1538',   'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',   'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',   'org.apache.hive.service.auth.TSetIpAddressProcessor:进程:TSetIpAddressProcessor.java:56',   'org.apache.thrift.server.TThreadPoolServer $ WorkerProcess:run:TThreadPoolServer.java:286',   'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149',   'java.util.concurrent.ThreadPoolExecutor $ Worker:run:ThreadPoolExecutor.java:624',   'java.lang.Thread:run:Thread.java:748',   '* java.io.IOException:java.io.IOException:不是文件:   hdfs://quickstart.cloudera:8020 / user / hive / warehouse / categories:29:4',   'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:508',   'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:415',   'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:140',   'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2069',   'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:458',   '* java.io.IOException:不是文件:   hdfs://quickstart.cloudera:8020 / user / hive / warehouse / categories:32:3',   'org.apache.hadoop.mapred.FileInputFormat:getSplits:FileInputFormat.java:322',   'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextSplits:FetchOperator.java:363',   'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:295',   'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:446'],   statusCode = 3),结果=无,hasMoreRows =无)

1 个答案:

答案 0 :(得分:0)

表位置似乎有误:/user/hive/warehouse-这看起来像默认仓库目录。里面有一些目录。它在/user/hive/warehouse/categories上失败,表示这不是文件。看起来这是类别表目录。

在/ user / hive / warehouse目录中创建一个文件夹,并将文件放入其中。像这样:

/user/hive/warehouse/pricefiles/pricefile_edited.txt

更改DDL中的表格位置:

LOCATION '/user/hive/warehouse/pricefiles

正则表达式不正确。每列在正则表达式(in parenthesis)中应具有相应的组。例如,您在第一列的正则表达式说它是12个点.,因为\\.的字面意思是点字符。如果要使用任何12个字符,则应为(。{12}),且不带两个斜杠。还要在组之间(空格或制表符或其他内容)之间添加定界符:(。{12})(。{1})-这将使用140219078921B0(140219078921)和B中的12个字符作为第二列。相应地修复您的正则表达式,必要时在组之间添加空格(定界符)。还要从regexp中删除多余的回车,将其写为单行。

您可以使用regexp_extract(string, regexp, group_number)以简单的方式测试正则表达式:

hive> select regexp_extract('140219078921B0 A1DU1M 1223105DDB','(.{12})',1); --extract group number 1 (group 0 is the whole regexp)
OK
140219078921
Time taken: 1.057 seconds, Fetched: 1 row(s)

hive> select regexp_extract('140219078921B0 A1DU1M 1223105DDB','(.{12})(.{1})',2); --extract group number 2
OK
B
Time taken: 0.441 seconds, Fetched: 1 row(s)

以此类推。添加更多组并仔细测试