使用sqoop将lzo压缩中的数据从hdfs导出到mysql时出现NoSuchElementException

时间:2014-02-17 07:36:34

标签: cloudera sqoop hadoop-lzo

我正在尝试使用sqoop将一些数据从hdfs导出到mysql。问题是当我尝试导出正确导出的未压缩文件时,如果我尝试导出使用lzo压缩压缩的同一文件,则sqoop作业将失败。我在标准的cloudera CDH4 VM环境中尝试它。文件中的列由制表符分隔,null表示为'\ N'。

文件内容:

[cloudera@localhost ~]$ cat dipayan-test.txt
dipayan koramangala 29
raju    marathahalli    32
raju    marathahalli    32
raju    \N  32
raju    marathahalli    32
raju    \N  32
raju    marathahalli    32
raju    marathahalli    \N
raju    marathahalli    \N

mysql表的描述:

mysql> describe sqooptest;
+---------+--------------+------+-----+---------+-------+
| Field   | Type         | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| name    | varchar(100) | YES  |     | NULL    |       |
| address | varchar(100) | YES  |     | NULL    |       |
| age     | int(11)      | YES  |     | NULL    |       |
+---------+--------------+------+-----+---------+-------+
3 rows in set (0.01 sec)

hdfs中的文件:

[cloudera@localhost ~]$ hadoop fs -ls /user/cloudera/dipayan-test
Found 1 items
-rw-r--r--   3 cloudera cloudera        138 2014-02-16 23:18 /user/cloudera/dipayan-test/dipayan-test.txt.lzo

sqoop命令:

sqoop export --connect "jdbc:mysql://localhost/bigdata" --username "root" --password "XXXXXX" --driver "com.mysql.jdbc.Driver" --table sqooptest --export-dir /user/cloudera/dipayan-test/ --input-fields-terminated-by '\t' -m 1 --input-null-string '\\N' --input-null-non-string '\\N'

错误:

[cloudera@localhost ~]$ sqoop export --connect "jdbc:mysql://localhost/bigdata" --username "root" --password "mysql" --driver "com.mysql.jdbc.Driver" --table sqooptest --export-dir /user/cloudera/dipayan-test/ --input-fields-terminated-by '\t' -m 1 --input-null-string '\\N' --input-null-non-string '\\N'
14/02/16 23:19:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/02/16 23:19:26 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
14/02/16 23:19:26 INFO manager.SqlManager: Using default fetchSize of 1000
14/02/16 23:19:26 INFO tool.CodeGenTool: Beginning code generation
14/02/16 23:19:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0
14/02/16 23:19:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0
14/02/16 23:19:27 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce
14/02/16 23:19:27 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-0.20-mapreduce/hadoop-core.jar
Note: /tmp/sqoop-cloudera/compile/676bc185f1efffa3b0de0a924df4a02d/sqooptest.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/02/16 23:19:29 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/676bc185f1efffa3b0de0a924df4a02d/sqooptest.jar
14/02/16 23:19:29 INFO mapreduce.ExportJobBase: Beginning export of sqooptest
14/02/16 23:19:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0
14/02/16 23:19:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/16 23:19:31 INFO input.FileInputFormat: Total input paths to process : 1
14/02/16 23:19:31 INFO input.FileInputFormat: Total input paths to process : 1
14/02/16 23:19:31 INFO mapred.JobClient: Running job: job_201402162201_0013
14/02/16 23:19:32 INFO mapred.JobClient:  map 0% reduce 0%
14/02/16 23:19:41 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_0, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
    at java.util.AbstractList$Itr.next(AbstractList.java:350)
    at sqooptest.__loadFromFields(sqooptest.java:225)
    at sqooptest.parse(sqooptest.java:174)
    at org.apach
14/02/16 23:19:48 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_1, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
    at java.util.AbstractList$Itr.next(AbstractList.java:350)
    at sqooptest.__loadFromFields(sqooptest.java:225)
    at sqooptest.parse(sqooptest.java:174)
    at org.apach
14/02/16 23:19:55 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_2, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
    at java.util.AbstractList$Itr.next(AbstractList.java:350)
    at sqooptest.__loadFromFields(sqooptest.java:225)
    at sqooptest.parse(sqooptest.java:174)
    at org.apach
14/02/16 23:20:04 INFO mapred.JobClient: Job complete: job_201402162201_0013
14/02/16 23:20:04 INFO mapred.JobClient: Counters: 7
14/02/16 23:20:04 INFO mapred.JobClient:   Job Counters 
14/02/16 23:20:04 INFO mapred.JobClient:     Failed map tasks=1
14/02/16 23:20:04 INFO mapred.JobClient:     Launched map tasks=4
14/02/16 23:20:04 INFO mapred.JobClient:     Data-local map tasks=4
14/02/16 23:20:04 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=29679
14/02/16 23:20:04 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/02/16 23:20:04 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/02/16 23:20:04 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/16 23:20:04 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
14/02/16 23:20:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 33.5335 seconds (0 bytes/sec)
14/02/16 23:20:04 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
14/02/16 23:20:04 INFO mapreduce.ExportJobBase: Exported 0 records.
14/02/16 23:20:04 ERROR tool.ExportTool: Error during export: Export job failed!

如果文件未压缩且我直接使用dipayan-test.txt文件,则此方法非常有效。

需要帮助解决此问题,并且还想知道在使用lzo文件时是否遗漏了某些内容。

2 个答案:

答案 0 :(得分:2)

出口可能由于多种原因而失败:

* Loss of connectivity from the Hadoop cluster to the database (either due to hardware fault, or server software crashes)
* Attempting to INSERT a row which violates a consistency constraint (for example, inserting a duplicate primary key value)
* Attempting to parse an incomplete or malformed record from the HDFS source data
* Attempting to parse records using incorrect delimiters
* Capacity issues (such as insufficient RAM or disk space) 

取自here

在我的情况下,我得到相同的 NoSuchElementException 并设置正确的字段终止符--fields-terminated-by '\t'解决了问题。

如果没有提到,Sqoop会将mysql的默认终结符视为:“,”作为字段终止符,“\ n”作为行终止符。

答案 1 :(得分:0)

您的表可能没有正确的列。您可以随时进入sqoop为您创建的.java文件并从那里进行调试:sqooptest.java:225