Question

我必须在一小时内非常频繁地扫描表格（〜百万次）。我有关于rowid的信息（这是一个字节数组）。我正在创建用于创建startrow和endrow的rowid，这在我的情况下基本相同。

     public String someMethod(byte[] rowid){
            if (aTable == null) {
                  aTable = new HTable(Config.getHadoopConfig(),
                  Config.getATable());     
            }
            byte[] endRow = new byte[rowId.length];
            endrow = System.copyArray(rowId, 0, endRow, 0, rowId.length)
            Scan scan = new Scan(rowId , endRow)
            //scanner implementation and iteration over the result
            (ResultScanner result = aTable.getScanner(scan);) {
                   for (Result item : result) {

                   }
            }
     }

我想知道是否可以实现一些连接池来提高性能。是否存在HBase Java API中可用的任何池化机制。我使用的是0.96.x版本的HBase。此外，是否有任何配置设置可以提高性能。感谢

Answer 1

自版本1.0以来，连接池API已更改。

新的API代码供读者参考：

// Create a connection to the cluster.
Configuration conf = HBaseConfiguration.create();
try (Connection connection = 
  ConnectionFactory.createConnection(conf);
  Table table = connection.getTable(TableName.valueOf(tablename))) {
// use table as needed, the table returned is lightweight
}

Answer 2

取自http://hbase.apache.org/book.html

连接池

对于需要高端多线程访问的应用程序（例如，   可以为许多应用程序提供服务的Web服务器或应用程序服   如单个JVM中的线程，您可以预先创建一个HConnection，如图所示   在以下示例中：

例9.1。预创建HConnection

// Create a connection to the cluster.
HConnection connection = HConnectionManager.createConnection(Configuration);
HTableInterface table = connection.getTable("myTable");
// use table as needed, the table returned is lightweight
table.close();
// use the connection for other access to the cluster
connection.close();

Answer 3

连接是线程安全的，并且非常重（这包括zookeeper和套接字连接等），因此它应该只为每个应用程序创建一次并在线程之间共享。表重量轻，但不是线程安全的。只有一个线程可以使用表实例，因此在使用Table实例时最好使用HBaseconfiguration实例。使用HBaseConfiguration，将确保将Zookeeper和套接字实例共享到Region服务器。

示例代码：

Configuration config = HBaseConfiguration.create();
config.addResource("hbase-site.xml");
try{
   Connection connection = 
   ConnectionFactory.createConnection(config);
   Table table = connection.getTable(TableName.valueOf("tableName"));
   Get getVal = new Get(Bytes.toBytes("rowkey"));
   Result result = table.get(getVal);
   byte [] value = 
   result.getValue(Bytes.toBytes("cf"),Bytes.toBytes("dataCol"));
}

Answer 4

我强烈建议重用连接实例。

// Create a connection to the cluster.
Configuration conf = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(conf);

try (Table table = connection.getTable(TableName.valueOf(tablename))) {
// use table as needed, the table returned is lightweight
}

非常大（默认情况下）为每个连接实例初始化用于批处理的线程池执行程序。

HConnectionImplementation implements ClusterConnection, Closeable {
...
private ExecutorService getBatchPool() {
  if (batchPool == null) {
    synchronized (this) {
      if (batchPool == null) {
        this.batchPool = getThreadPool(conf.getInt("hbase.hconnection.threads.max", 256),
            conf.getInt("hbase.hconnection.threads.core", 256), "-shared-", null);
        this.cleanupPool = true;
      }
    }
  }
  return this.batchPool;
}

HBase连接池用于非常频繁地扫描行

4 个答案: