使用大型数据集导致对HBase的Sqoop导入失败

时间:2013-10-24 10:21:57

标签: hadoop hbase sqoop

我在使用带有sqoop的大型数据集导入HBase时遇到问题,大约有500万条记录。 mapreduce工作开始但约30%后停止。然后返回以下错误消息。

我环顾四周找到了这个link并通过添加import -D, mapred.task.timeout=0-m来调整我的命令只是为了试一试,但最终结果是相同的,尽管它仍然停止现在是90%。

sqoop import命令就是这样。我错过了任何参数,或者我需要添加到hbase-site或zoo.cfg配置文件中?

> ./sqoop import --connect  import -D mapred.task.timeout=0 'jdbc:sqlserver://192.168.4.1:1433;database=dbname;user=sa;password=password' --table user --hbase-table newtable --column-family cf1 --hbase-row-key id --hbase-create-table --split-by id -m 14

    13/10/24 15:06:29 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
13/10/24 15:06:29 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 3388@cloudera
13/10/24 15:06:29 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x141e977a64e0004, negotiated timeout = 40000
13/10/24 15:06:29 INFO zookeeper.ClientCnxn: EventThread shut down
13/10/24 15:06:29 INFO zookeeper.ZooKeeper: Session: 0x141e977a64e0004 closed
13/10/24 15:06:29 INFO mapreduce.HBaseImportJob: Creating missing HBase table ai
13/10/24 15:06:30 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@11d1284a
13/10/24 15:06:30 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
13/10/24 15:06:30 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
13/10/24 15:06:30 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 3388@cloudera
13/10/24 15:06:30 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x141e977a64e0005, negotiated timeout = 40000
13/10/24 15:06:30 INFO zookeeper.ZooKeeper: Session: 0x141e977a64e0005 closed
13/10/24 15:06:30 INFO zookeeper.ClientCnxn: EventThread shut down
13/10/24 15:06:31 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([AIIDX]), MAX([AIIDX]) FROM [ai_view]
13/10/24 15:06:32 INFO mapred.JobClient: Running job: job_201310241455_0001
13/10/24 15:06:33 INFO mapred.JobClient:  map 0% reduce 0%
13/10/24 15:08:24 INFO mapred.JobClient:  map 7% reduce 0%
13/10/24 15:08:50 INFO mapred.JobClient:  map 14% reduce 0%
13/10/24 15:10:11 INFO mapred.JobClient:  map 21% reduce 0%
13/10/24 15:10:51 INFO mapred.JobClient:  map 28% reduce 0%
13/10/24 15:12:16 INFO mapred.JobClient:  map 35% reduce 0%
13/10/24 15:12:57 INFO mapred.JobClient:  map 42% reduce 0%
13/10/24 15:14:12 INFO mapred.JobClient:  map 50% reduce 0%
13/10/24 15:14:55 INFO mapred.JobClient:  map 57% reduce 0%
13/10/24 15:16:35 INFO mapred.JobClient:  map 64% reduce 0%
13/10/24 15:17:28 INFO mapred.JobClient:  map 71% reduce 0%
13/10/24 15:18:42 INFO mapred.JobClient:  map 78% reduce 0%
13/10/24 15:19:24 INFO mapred.JobClient:  map 85% reduce 0%
13/10/24 15:20:44 INFO mapred.JobClient:  map 92% reduce 0%
13/10/24 16:28:28 INFO mapred.JobClient: Task Id : attempt_201310241455_0001_m_000013_0, Status : FAILED
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:390)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:436)
    at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1133)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:980)
    at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
    at com.sun.proxy.$Proxy7.getClosestRowBefore(Unknown Source)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1137)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1000)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:975)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1214)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:961)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1678)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1563)
    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:990)
    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:846)
    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:822)
    at org.apache.sqoop.hbase.HBasePutProcessor.accept(HBasePutProcessor.java:150)
    at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.write(DelegatingOutputFormat.java:128)
    at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.write(DelegatingOutputFormat.java:92)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at org.apache.sqoop.mapreduce.HBaseImportMapper.map(HBaseImportMapper.java:38)
    at org.apache.sqoop.mapreduce.HBaseImportMapper.map(HBaseImportMapper.java:31)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/10/24 16:33:12 INFO mapred.JobClient: Task Id : attempt_201310241455_0001_m_000013_1, Status : FAILED
java.lang.RuntimeException: Could not access HBase table ai
    at org.apache.sqoop.hbase.HBasePutProcessor.setConf(HBasePutProcessor.java:122)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.<init>(DelegatingOutputFormat.java:107)
    at org.apache.sqoop.mapreduce.DelegatingOutputFormat.getRecordWriter(DelegatingOutputFormat.java:82)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:628)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for ai,,99999999999999 after 14 tries.
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1095)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1000)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1102)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:961)
    at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:251)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:155)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:129)
    at org.apache.sqoop.hbase.HBasePutProcessor.setConf(HBasePutProcessor.java:120)
    ... 12 more

13/10/24 16:37:58 INFO mapred.JobClient: Task Id : attempt_201310241455_0001_m_000013_2, Status : FAILED
java.lang.RuntimeException: Could not access HBase table ai
    at org.apache.sqoop.hbase.HBasePutProcessor.setConf(HBasePutProcessor.java:122)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.<init>(DelegatingOutputFormat.java:107)
    at org.apache.sqoop.mapreduce.DelegatingOutputFormat.getRecordWriter(DelegatingOutputFormat.java:82)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:628)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for ai,,99999999999999 after 14 tries.
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1095)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1000)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1102)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:961)
    at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:251)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:155)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:129)
    at org.apache.sqoop.hbase.HBasePutProcessor.setConf(HBasePutProcessor.java:120)
    ... 12 more

13/10/24 16:42:44 INFO mapred.JobClient: Job complete: job_201310241455_0001
13/10/24 16:42:44 INFO mapred.JobClient: Counters: 18
13/10/24 16:42:44 INFO mapred.JobClient:   Job Counters 
13/10/24 16:42:44 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6610795
13/10/24 16:42:44 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/10/24 16:42:44 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/10/24 16:42:44 INFO mapred.JobClient:     Launched map tasks=17
13/10/24 16:42:44 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/10/24 16:42:44 INFO mapred.JobClient:     Failed map tasks=1
13/10/24 16:42:44 INFO mapred.JobClient:   File Output Format Counters 
13/10/24 16:42:44 INFO mapred.JobClient:     Bytes Written=0
13/10/24 16:42:44 INFO mapred.JobClient:   FileSystemCounters
13/10/24 16:42:44 INFO mapred.JobClient:     HDFS_BYTES_READ=1498
13/10/24 16:42:44 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1089897
13/10/24 16:42:44 INFO mapred.JobClient:   File Input Format Counters 
13/10/24 16:42:44 INFO mapred.JobClient:     Bytes Read=0
13/10/24 16:42:44 INFO mapred.JobClient:   Map-Reduce Framework
13/10/24 16:42:44 INFO mapred.JobClient:     Map input records=4782546
13/10/24 16:42:44 INFO mapred.JobClient:     Physical memory (bytes) snapshot=2150453248
13/10/24 16:42:44 INFO mapred.JobClient:     Spilled Records=0
13/10/24 16:42:44 INFO mapred.JobClient:     CPU time spent (ms)=313010
13/10/24 16:42:44 INFO mapred.JobClient:     Total committed heap usage (bytes)=1125842944
13/10/24 16:42:44 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=13256167424
13/10/24 16:42:44 INFO mapred.JobClient:     Map output records=4782546
13/10/24 16:42:44 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1498
13/10/24 16:42:44 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 5,773.4138 seconds (0 bytes/sec)
13/10/24 16:42:44 INFO mapreduce.ImportJobBase: Retrieved 4782546 records.
13/10/24 16:42:44 ERROR tool.ImportTool: Error during import: Import job failed!

1 个答案:

答案 0 :(得分:-1)

如果你认为用户名,密码,端口是正确的,你可能需要为sql-server安装JDBC驱动程序。 Sqoop不附带第三方JDBC驱动程序。

您似乎使用了Cloudera,请查看此https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_13_7.html