由于HBase中的BAD表格设计,我遇到了问题。问题是数百万条记录最终落在同一行键下(1 cf)。直到2.5M我才能通过扫描单行来运行带有Spark的mapReduces,但现在有些行达到了5或6百万,而且每当我执行扫描或获取时,我的所有区域服务器都会在几分钟内停止运行。我正在使用HDP 2.2和HBase 0.98.4.2.2
到目前为止,我已尝试过:
2015-08-25 15:07:19,722 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-0] handler.OpenRegionHandler: Opened my-hbase-table,20150807.33,1439222912086.e731d603bb5d1f0d593736eab922069c. on ip-XXX-XX-XX-XXX.eu-west-1.compute.internal,60020,1440528949321
2015-08-25 15:07:19,724 INFO [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-1] regionserver.HRegion: Replaying edits from hdfs://ip-XXX-XX-XX-XX2.eu-west-1.compute.internal:8020/apps/hbase/data/data/default/my-hbase-table/3bc481ff534f0907e6b99d5eff1793f5/recovered.edits/0000000000011099011
2015-08-25 15:07:19,725 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-2] zookeeper.ZKAssign: regionserver:60020-0x24f65d7e5df025c, quorum=ip-XXX-XX-XX-XX2.eu-west-1.compute.internal:2181,ip-XXX-XX-XX-XXX.eu-west-1.compute.internal:2181,ip-XXX-XX-XX-XX3.eu-west-1.compute.internal:2181, baseZNode=/hbase-unsecure Transitioned node 4945982779c1cba7b1726e77a45d405a from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
2015-08-25 15:07:19,725 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-2] handler.OpenRegionHandler: Transitioned 4945982779c1cba7b1726e77a45d405a to OPENED in zk on ip-XXX-XX-XX-XXX.eu-west-1.compute.internal,60020,1440528949321
2015-08-25 15:07:19,726 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-2] handler.OpenRegionHandler: Opened my-hbase-table,20150727.33,1438203991635.4945982779c1cba7b1726e77a45d405a. on ip-XXX-XX-XX-XXX.eu-west-1.compute.internal,60020,1440528949321
2015-08-25 15:07:19,733 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-1] zookeeper.ZKAssign: regionserver:60020-0x24f65d7e5df025c, quorum=ip-XXX-XX-XX-XX2.eu-west-1.compute.internal:2181,ip-XXX-XX-XX-XXX.eu-west-1.compute.internal:2181,ip-XXX-XX-XX-XX3.eu-west-1.compute.internal:2181, baseZNode=/hbase-unsecure Attempting to retransition opening state of node 3bc481ff534f0907e6b99d5eff1793f5
2015-08-25 15:07:19,734 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-1] regionserver.HRegion: Applied 0, skipped 1, firstSequenceidInLog=11099011, maxSequenceidInLog=11099011, path=hdfs://ip-XXX-XX-XX-XX2.eu-west-1.compute.internal:8020/apps/hbase/data/data/default/my-hbase-table/3bc481ff534f0907e6b99d5eff1793f5/recovered.edits/0000000000011099011
2015-08-25 15:07:19,734 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-1] regionserver.HRegion: Empty memstore size for the current region my-hbase-table,20150824.33,1440473855617.3bc481ff534f0907e6b99d5eff1793f5.
2015-08-25 15:07:19,737 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-1] regionserver.HRegion: Deleted recovered.edits file=hdfs://ip-XXX-XX-XX-XX2.eu-west-1.compute.internal:8020/apps/hbase/data/data/default/my-hbase-table/3bc481ff534f0907e6b99d5eff1793f5/recovered.edits/0000000000011099011
2015-08-25 15:07:19,759 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-1] wal.HLogUtil: Written region seqId to file:hdfs://ip-XXX-XX-XX-XX2.eu-west-1.compute.internal:8020/apps/hbase/data/data/default/my-hbase-table/3bc481ff534f0907e6b99d5eff1793f5/recovered.edits/11099013_seqid ,newSeqId=11099013 ,maxSeqId=11099010
2015-08-25 15:07:19,761 INFO [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-1] regionserver.HRegion: Onlined 3bc481ff534f0907e6b99d5eff1793f5; next sequenceid=11099013
2015-08-25 15:07:19,764 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-1] zookeeper.ZKAssign: regionserver:60020-0x24f65d7e5df025c, quorum=ip-XXX-XX-XX-XX2.eu-west-1.compute.internal:2181,ip-XXX-XX-XX-XXX.eu-west-1.compute.internal:2181,ip-XXX-XX-XX-XX3.eu-west-1.compute.internal:2181, baseZNode=/hbase-unsecure Attempting to retransition opening state of node 3bc481ff534f0907e6b99d5eff1793f5
2015-08-25 15:07:19,773 INFO [PostOpenDeployTasks:3bc481ff534f0907e6b99d5eff1793f5] regionserver.HRegionServer: Post open deploy tasks for region=my-hbase-table,20150824.33,1440473855617.3bc481ff534f0907e6b99d5eff1793f5.
2015-08-25 15:07:19,773 DEBUG [PostOpenDeployTasks:3bc481ff534f0907e6b99d5eff1793f5] regionserver.CompactSplitThread: Small Compaction requested: system; Because: Opening Region; compaction_queue=(0:1), split_queue=0, merge_queue=0
2015-08-25 15:07:19,774 DEBUG [regionserver60020-smallCompactions-1440529300855] compactions.RatioBasedCompactionPolicy: Selecting compaction from 4 store files, 0 compacting, 4 eligible, 10 blocking
2015-08-25 15:07:19,774 DEBUG [regionserver60020-smallCompactions-1440529300855] compactions.ExploringCompactionPolicy: Exploring compaction algorithm has selected 0 files of size 0 starting at candidate #-1 after considering 3 permutations with 0 in ratio
2015-08-25 15:07:19,774 DEBUG [regionserver60020-smallCompactions-1440529300855] compactions.RatioBasedCompactionPolicy: Not compacting files because we only have 0 files ready for compaction. Need 3 to initiate.
2015-08-25 15:07:19,775 DEBUG [regionserver60020-smallCompactions-1440529300855] regionserver.CompactSplitThread: Not compacting my-hbase-table,20150824.33,1440473855617.3bc481ff534f0907e6b99d5eff1793f5. because compaction request was cancelled
2015-08-25 15:07:19,787 INFO [PostOpenDeployTasks:3bc481ff534f0907e6b99d5eff1793f5] catalog.MetaEditor: Updated row my-hbase-table,20150824.33,1440473855617.3bc481ff534f0907e6b99d5eff1793f5. with server=ip-XXX-XX-XX-XXX.eu-west-1.compute.internal,60020,1440528949321
2015-08-25 15:07:19,787 INFO [PostOpenDeployTasks:3bc481ff534f0907e6b99d5eff1793f5] regionserver.HRegionServer: Finished post open deploy task for my-hbase-table,20150824.33,1440473855617.3bc481ff534f0907e6b99d5eff1793f5.
2015-08-25 15:07:19,788 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-1] zookeeper.ZKAssign: regionserver:60020-0x24f65d7e5df025c, quorum=ip-XXX-XX-XX-XX2.eu-west-1.compute.internal:2181,ip-XXX-XX-XX-XXX.eu-west-1.compute.internal:2181,ip-XXX-XX-XX-XX3.eu-west-1.compute.internal:2181, baseZNode=/hbase-unsecure Transitioning 3bc481ff534f0907e6b99d5eff1793f5 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
2015-08-25 15:07:19,791 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-1] zookeeper.ZKAssign: regionserver:60020-0x24f65d7e5df025c, quorum=ip-XXX-XX-XX-XX2.eu-west-1.compute.internal:2181,ip-XXX-XX-XX-XXX.eu-west-1.compute.internal:2181,ip-XXX-XX-XX-XX3.eu-west-1.compute.internal:2181, baseZNode=/hbase-unsecure Transitioned node 3bc481ff534f0907e6b99d5eff1793f5 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
2015-08-25 15:07:19,791 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-1] handler.OpenRegionHandler: Transitioned 3bc481ff534f0907e6b99d5eff1793f5 to OPENED in zk on ip-XXX-XX-XX-XXX.eu-west-1.compute.internal,60020,1440528949321
2015-08-25 15:07:19,791 DEBUG [RS_OPEN_REGION-ip-XXX-XX-XX-XXX:60020-1] handler.OpenRegionHandler: Opened my-hbase-table,20150824.33,1440473855617.3bc481ff534f0907e6b99d5eff1793f5. on ip-XXX-XX-XX-XXX.eu-west-1.compute.internal,60020,1440528949321
2015-08-25 15:07:20,344 INFO [B.DefaultRpcServer.handler=3,queue=3,port=60020] regionserver.HRegionServer: Client tried to access missing scanner 1
2015-08-25 15:07:20,346 DEBUG [B.DefaultRpcServer.handler=3,queue=3,port=60020] ipc.RpcServer: B.DefaultRpcServer.handler=3,queue=3,port=60020: callId: 36 service: ClientService methodName: Scan size: 25 connection: 172.31.40.100:42285
org.apache.hadoop.hbase.UnknownScannerException: Name: 1, already closed?
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3150)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
at java.lang.Thread.run(Thread.java:745)
在使用新的rowkey设计创建新表之前,我确实需要处理这些数据。我是HBase的新手,所以也许这些建议听起来很愚蠢,但是:
谢谢!
编辑:
实际上,我认为第二种选择是不可行的,因为hbase不允许更新记录,只需删除+再次创建。
已编辑2:
连续的每条记录大约是几十个字节。当每行拥有数百万条记录时,尝试扫描此类行时的问题是,在几分钟之后,区域服务器开始逐个停止。也许试图获得一排512MB aprox。对于我的群集配置而言太大了:每个8GB的6个节点。
在HBase日志中搜索我能找到的唯一例外是:
scan 'my-table', {STARTROW=>'row-key',ENDROW=>'row-key', FILTER=> ColumnRangeFilter.new(Bytes.toBytes('first_possible_column_prefix'),true,Bytes.toBytes('another_possible_column_prefix’),false)}
编辑3:
我尝试使用ColumnRangeFilter连续进行范围扫描,并且它在不放下任何区域服务器的情况下正在运行:
val scanPoints = new Scan()
scanPoints.setStartRow((queryDate+"."+venueId).getBytes())
scanPoints.setStopRow((queryDate+"."+venueId+"1").getBytes())
scanPoints.setFilter(new ColumnRangeFilter(Bytes.toBytes("first_possible_column_prefix"),true,Bytes.toBytes("another_possible_column_prefix"),false))
...
val confPoints = HBaseConfiguration.create()
confPoints.set(TableInputFormat.INPUT_TABLE, Utils.settings.HBaseWifiVisitorsTableName)
confPoints.set("hbase.zookeeper.quorum", Utils.settings.zQuorum);
confPoints.setInt("zookeeper.session.timeout", 6000000)
confPoints.set("hbase.zookeeper.property.clientPort", Utils.settings.zPort);
confPoints.set("zookeeper.znode.parent",Utils.settings.HBaseZNode)
confPoints.set("hbase.master", Utils.settings.HBaseMaster)
confPoints.set("hbase.mapreduce.scan.column.family","positions")
confPoints.setLong("hbase.client.scanner.max.result.size",2147483648L)
confPoints.setLong("hbase.server.scanner.max.result.size",2147483648L)
confPoints.setInt("hbase.rpc.timeout",6000000)
confPoints.setInt("hbase.client.operation.timeout",6000000)
confPoints.set(TableInputFormat.SCAN, convertScanToString(scanPoints))
...
val rdd = sc.newAPIHadoopRDD(confPoints, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result]).cache()
Spark中的这段代码使区域服务器失效,行为与以前相同:
{{1}}
如果我能够使这个Spark Job工作,我可以遍历整行扫描间隔来完全处理它。