来自Oracle的Sqoop import --as-parquetfile获取错误的文件类型

时间:2016-07-28 18:16:54

标签: database oracle sqoop cloudera-cdh parquet

我正在使用带有Sqoop 1.4.6和Hive 1.2.1的CDH 5.5(我手动下载以获得Parquet对其他数据类型的支持)

这是我正在使用的导入命令:

sqoop import --connect jdbc:oracle:thin:@database:port:name --username username --password password --table SHARE_PROPERTY -m 1 --null-string '\\N' --null-non-string '\\N'  --hive-import --hive-table SHARE_PROPERTY --as-parquetfile --compression-codec snappy --map-column-hive "CREATE_DATE=String"

命令成功完成,当我在Hive中描述表时,我看到:

hive> describe share_property;
OK
share_property_id       string                                      
customer_id             string                                      
website_user_id         string                                      
address_id              string                                      
listing_num             string                                      
source                  string                                      
is_facebook             string                                      
is_twitter              string                                      
is_linkedin             string                                      
is_email                string                                      
create_date             string                                      
Time taken: 1.09 seconds, Fetched: 11 row(s)

如果我停止尝试导入镶木地板文件并仅使用默认文本文件,则大多数这些字段实际上是Oracle NUMBER类型,并导入为double。 create_date字段是Oracle中的DATE,有时作为bigint导入,有时作为字符串导入,具体取决于我使用的命令。

无论如何,当我尝试在Hive中查询这些数据时,我遇到了这个错误:

hive> select * from share_property limit 20;
OK
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/parquet-format-2.1.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/parquet-pig-bundle-1.5.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/parquet-hadoop-bundle-1.5.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [shaded.parquet.org.slf4j.helpers.NOPLoggerFactory]
Exception in thread "main" java.lang.NoSuchMethodError: parquet.schema.Types$MessageTypeBuilder.addFields([Lparquet/schema/Type;)Lparquet/schema/Types$GroupBuilder;
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:159)
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:222)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:256)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:99)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:85)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:673)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

显然数据没有正确导入,或镶木地板有问题。如果我作为文本文件导入,我可以成功查询数据(只要我设置--map-column-hive CREATE_DATE=String,否则它导入为long),但如果我尝试将该数据插入到镶木地板表中以转换格式,它也出错了。那么也许错误存在?我尝试使用--map-column-hive--map-column-java手动设置所有列类型,但这似乎没有任何帮助

此外,我发现文档说Sqoop支持导入Oracle DATE类型,并且Parquet支持日期/时间戳,但我无法将其成功导入为任何一个(使用{{ 1}}或--map-column-hive--map-column-java -Doraoop.timestamp.string=false选项)

我觉得有一些文件我缺少或者找不到。有没有人见过这个?

0 个答案:

没有答案