我要构建的是这样的:
我尝试以下方法。我编写了一个测试程序,并使用maven程序集插件在一个文件夹中准备了jar:
val hbaseConfig = new Configuration()
hbaseConfig.clear()
hbaseConfig.set("hbase.zookeeper.quorum", "ip-10-0-xxx-xxx.ec2.internal")
dumpConfiguration(hbaseConfig)
HBaseAdmin.checkHBaseAvailable(hbaseConfig)
val connection = ConnectionFactory.createConnection(hbaseConfig)
val table = connection.getTable(TableName.valueOf("test-messages-01"))
val scan = new Scan()
scan.addColumn("msgs".getBytes(), "msg".getBytes)
val scanner = table.getScanner(scan)
println(s"scanner: $scanner")
import scala.collection.JavaConverters._
scanner.iterator().asScala.toStream.take(20).foreach(x => println(s"--> $x"))
scanner.close()
table.close()
connection.close()
我从同一台EMR机器上启动了这个小程序,并且可以正常工作。
我还从POD(当然是在其中修改了安全组)启动了这个小程序,它也运行得很好。
现在,我启动了一个新的EMR群集,其中HBase是只读副本。
我可以SSH到EMR机器,先运行hbase shell
然后运行count 'test-messages-01'
,它也可以工作。
我对程序做了一点修改,只需添加集群ID作为元表后缀即可:
val hbaseConfig = new Configuration()
hbaseConfig.clear()
hbaseConfig.set("hbase.zookeeper.quorum", "ip-10-0-xxx-xxx.ec2.internal")
hbaseConfig.set("hbase.meta.table.suffix", "j-xxxxxxxx")
dumpConfiguration(hbaseConfig)
HBaseAdmin.checkHBaseAvailable(hbaseConfig)
val connection = ConnectionFactory.createConnection(hbaseConfig)
val table = connection.getTable(TableName.valueOf("test-messages-01"))
val scan = new Scan()
scan.addColumn("msgs".getBytes(), "msg".getBytes)
val scanner = table.getScanner(scan)
println(s"scanner: $scanner")
import scala.collection.JavaConverters._
scanner.iterator().asScala.toStream.take(20).foreach(x => println(s"--> $x"))
scanner.close()
table.close()
connection.close()
然后我尝试在EMR主计算机上运行它,但是它不再起作用了。
从日志中,我看到它已成功连接到zookeeper,并且还建立了HBase连接,并且还创建了扫描程序。但是它挂起了扫描仪的迭代过程,在该扫描仪中,我有很多org.apache.hadoop.hbase.NotServingRegionException
异常:
[main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=ip-10-0-104-68.ec2.internal:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@625732
[main-SendThread(ip-10-0-104-68.ec2.internal:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server ip-10-0-104-68.ec2.internal/10.0.104.68:2181. Will not attempt to authenticate using SASL (unknown error)
[main-SendThread(ip-10-0-104-68.ec2.internal:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to ip-10-0-104-68.ec2.internal/10.0.104.68:2181, initiating session
[main-SendThread(ip-10-0-104-68.ec2.internal:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server ip-10-0-104-68.ec2.internal/10.0.104.68:2181, sessionid = 0x1000001ffb70011, negotiated timeout = 40000
[main] INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation - Closing master protocol: MasterService
[main] INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation - Closing zookeeper sessionid=0x1000001ffb70011
[main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x1000001ffb70011 closed
[main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1000001ffb70011
[main] INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x52d239ba connecting to ZooKeeper ensemble=ip-10-0-104-68.ec2.internal:2181
[main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=ip-10-0-104-68.ec2.internal:2181 sessionTimeout=180000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@315f43d5
[main-SendThread(ip-10-0-104-68.ec2.internal:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server ip-10-0-104-68.ec2.internal/10.0.104.68:2181. Will not attempt to authenticate using SASL (unknown error)
[main-SendThread(ip-10-0-104-68.ec2.internal:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to ip-10-0-104-68.ec2.internal/10.0.104.68:2181, initiating session
[main-SendThread(ip-10-0-104-68.ec2.internal:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server ip-10-0-104-68.ec2.internal/10.0.104.68:2181, sessionid = 0x1000001ffb70012, negotiated timeout = 40000
scanner: org.apache.hadoop.hbase.client.ClientSimpleScanner@696f0212
[hconnection-0x52d239ba-metaLookup-shared--pool4-t1] INFO org.apache.hadoop.hbase.client.RpcRetryingCaller - Call exception, tries=10, retries=31, started=38396 ms ago, cancelled=false, msg=org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is not online on ip-10-0-122-64.ec2.internal,16020,1552648181048
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3086)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1275)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2678)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3012)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36613)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
row 'test-messages-01,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-10-0-122-64.ec2.internal,16020,1552648181048, seqNum=0
[hconnection-0x52d239ba-metaLookup-shared--pool4-t1] INFO org.apache.hadoop.hbase.client.RpcRetryingCaller - Call exception, tries=11, retries=31, started=48452 ms ago, cancelled=false, msg=org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is not online on ip-10-0-122-64.ec2.internal,16020,1552648181048
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3086)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1275)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2678)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3012)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36613)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
row 'test-messages-01,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-10-0-122-64.ec2.internal,16020,1552648181048, seqNum=0
我什至尝试在EMR机器/usr/lib/hbase/*
中使用hbase库,但是它仍然无法正常工作。
我有点茫然,无法通过谷歌搜索找到解决方案。