Question

我创建了一个这样的外部表：

 CREATE External TABLE  IF NOT EXISTS  words  (word string, timest string, 
    url string, occs string, nos string, hiveall string, occall string) STORED 
    BY org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES 
    ('hbase.columns.mapping' =':key, count:timest,  count:url, count:occs, 
    count:nos, other:hiveall, other:occall ')

有没有办法动态创建columnfamilys？所以我有这样的事情：

1397897857000      column=word:occall, timestamp=1449778100184, value=value1

1397897857000      column=otherword:occall, timestamp=1449778100184, value=value2

我想过这样的东西，但是从hive来看，这里的代码来自hbase：

Configuration config = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
String table = "myTable";

admin.disableTable(table);

HColumnDescriptor cf1 = ...;
admin.addColumn(table, cf1);      // adding new ColumnFamily
HColumnDescriptor cf2 = ...;
admin.modifyColumn(table, cf2);    // modifying existing ColumnFamily

admin.enableTable(table);

从这里： http://hbase.apache.org/0.94/book/schema.html

或者有人对我的问题有另一个想法：我有来自字数统计工作的多个数据。这个数据包含从中读取单词的URL，时间戳，读取单词的时间，在URL中发现的频率，以及有关类别（有新闻，社交和所有）的一些信息发生。主要问题是多个单词可能出现在同一时间戳，这将覆盖现有的单词。我需要将rowkey作为时间戳来对它进行一些查询（就像过去两周中最常用的词一样）。

Answer 1

创建后的列族不能像这样更改。在您的方案中，您应该创建不同的列限定符，而不是不同的列族。

修复列族并使用单词作为限定符名称。因此，当不同的单词出现在同一时间戳时，它不会覆盖。

有没有办法动态创建外部表中的Columnfamily？

1 个答案: