我想通过连接hbase从hbase获取数据。我在运行代码之前已在终端上提到了kerberos auth。
当我运行代码时:
conf2 = {"hbase.zookeeper.quorum": "AAAAA:2181,BBBBB:2181,CCCCC:2181,DDDDD:2181", "hbase.mapreduce.inputtable": 'doaat_nsi_dev:b_logs_alimbrute'}
hbase_rdd = sc.newAPIHadoopRDD(
"org.apache.hadoop.hbase.mapreduce.TableInputFormat",
"org.apache.hadoop.hbase.io.ImmutableBytesWritable",
"org.apache.hadoop.hbase.client.Result",
keyConverter="org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter",
valueConverter="org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter",
conf=conf2)
output = hbase_rdd.collect()
我获得了这个日志:
17/05/15 16:23:53 INFO MemoryStore: Block broadcast_16 stored as values in memory (estimated size 356.1 KB, free 356.1 KB)
17/05/15 16:23:53 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 29.6 KB, free 385.7 KB)
17/05/15 16:23:53 INFO BlockManagerInfo: Added broadcast_16_piece0 in memory on localhost:41904 (size: 29.6 KB, free: 511.1 MB)
17/05/15 16:23:53 INFO SparkContext: Created broadcast 16 from newAPIHadoopRDD at PythonRDD.scala:546
17/05/15 16:23:53 INFO MemoryStore: Block broadcast_17 stored as values in memory (estimated size 340.9 KB, free 726.6 KB)
17/05/15 16:23:53 INFO MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 29.6 KB, free 756.2 KB)
17/05/15 16:23:53 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory on localhost:41904 (size: 29.6 KB, free: 511.1 MB)
17/05/15 16:23:53 INFO SparkContext: Created broadcast 17 from broadcast at PythonRDD.scala:527
17/05/15 16:23:53 INFO Converter: Loaded converter: org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter
17/05/15 16:23:53 INFO Converter: Loaded converter: org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter
17/05/15 16:23:53 INFO RecoverableZooKeeper: Process identifier=hconnection-0x62f431f8 connecting to ZooKeeper ensemble=AAAAA:2181,BBBBB:2181,CCCCC:2181,DDDDD:2181
17/05/15 16:23:53 INFO ZooKeeper: Initiating client connection, connectString=EEEEE:2181,FFFFF:2181,GGGGG:2181,CCCCC:2181,DDDDD:2181 sessionTimeout=180000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@70
17/05/15 16:23:53 INFO ClientCnxn: Opening socket connection to server BBBBB/10.100.100.00:2181. Will not attempt to authenting SASL (unknown error)
17/05/15 16:23:53 INFO ClientCnxn: Socket connection established to BBBBB/10.100.100.00:2181, initiating session
17/05/15 16:23:53 INFO ClientCnxn: Session establishment complete on server BBBBB/10.100.100.00:2181, sessionid = 0x35b14e49 negotiated timeout = 40000
17/05/15 16:23:53 INFO RegionSizeCalculator: Calculating region sizes for table "doaat_nsi_dev:b_logs_alimbrute".
17/05/15 16:25:01 INFO RpcRetryingCaller: Call exception, tries=10, retries=31, started=68415 ms ago, cancelled=false, msg=row 'doaat_nsi_gs_alimbrute,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427,
17/05/15 16:25:22 INFO RpcRetryingCaller: Call exception, tries=11, retries=31, started=88580 ms ago, cancelled=false, msg=row 'doaat_nsi_gs_alimbrute,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427,
17/05/15 16:25:42 INFO RpcRetryingCaller: Call exception, tries=12, retries=31, started=108631 ms ago, cancelled=false, msg=row 'doaat_nsiogs_alimbrute,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427,0
17/05/15 16:26:02 INFO RpcRetryingCaller: Call exception, tries=13, retries=31, started=128692 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:26:22 INFO RpcRetryingCaller: Call exception, tries=14, retries=31, started=148730 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:26:42 INFO RpcRetryingCaller: Call exception, tries=15, retries=31, started=168772 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:27:02 INFO RpcRetryingCaller: Call exception, tries=16, retries=31, started=188867 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
^CTraceback (most recent call last):
File "", line 1, in
File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 644, in newAPIHadoopRDD
jconf, batchSize)
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 811, in __call__
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 626, in send_command
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 740, in send_command
File "/usr/lib64/python2.7/socket.py", line 430, in readline
data = recv(1)
File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 225, in signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt
>>> 17/05/15 16:27:22 INFO RpcRetryingCaller: Call exception, tries=17, retries=31, started=209041 ms ago, cancelled=false, msg=row 'doaat,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:27:42 INFO RpcRetryingCaller: Call exception, tries=18, retries=31, started=229056 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:28:02 INFO RpcRetryingCaller: Call exception, tries=19, retries=31, started=249149 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:28:22 INFO RpcRetryingCaller: Call exception, tries=20, retries=31, started=269250 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:28:42 INFO RpcRetryingCaller: Call exception, tries=21, retries=31, started=289429 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:29:03 INFO RpcRetryingCaller: Call exception, tries=22, retries=31, started=309477 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:29:23 INFO RpcRetryingCaller: Call exception, tries=23, retries=31, started=329683 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:29:43 INFO RpcRetryingCaller: Call exception, tries=24, retries=31, started=349811 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:30:03 INFO RpcRetryingCaller: Call exception, tries=25, retries=31, started=369907 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:30:23 INFO RpcRetryingCaller: Call exception, tries=26, retries=31, started=389927 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:30:43 INFO RpcRetryingCaller: Call exception, tries=27, retries=31, started=410051 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:31:03 INFO RpcRetryingCaller: Call exception, tries=28, retries=31, started=430260 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:31:23 INFO RpcRetryingCaller: Call exception, tries=29, retries=31, started=450432 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:31:44 INFO RpcRetryingCaller: Call exception, tries=30, retries=31, started=470448 ms ago, cancelled=false, msg=row 'doaat_nsi000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=MMMMMMM,60020,1493374572427, seqNum=0
17/05/15 16:31:44 INFO ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x35b14e49ff3b9d8
17/05/15 16:31:44 INFO ZooKeeper: Session: 0x35b14e49ff3b9d8 closed
17/05/15 16:31:44 INFO ClientCnxn: EventThread shut down
我增加了zookeeper的超时但它没有返回值(RDD)。我不明白为什么我没有得到结果。
我应该在conf中定义keytab的主体和路径吗?如果答案是肯定的,怎么在pyspark?
感谢您的建议,