Question

我有一个目前设置的表格如下：

rowId : colFam: colQual -> value

in001 : user : name -> erp
in001 : user : age -> 23
in001 : group : name -> employee
in001 : group : name -> developer

我似乎无法想出一种方法来删除其中一个组条目，或者为此更改它。假设我想删除与员工的行，因为我现在是经理。添加是显而易见的，但我似乎无法弄清楚如何访问employee，因为这两个组具有相同的colFam和colQual。

我知道mutation.putDelete(colFam,colQual)但是这里不适用，因为结果会删除两者。或者，如果我只是扫描每一行并获得键值对，如

for(Entry<Key,Value> e: scanner){
    e.getValue().toString() // atleast I can access it here
}

但即便如此，如何知道要删除什么？这只是我设计桌子的一个缺陷吗？

Answer 1

虽然Accumulo的Key-Value架构允许你这样做，但是你发现它会有问题。该值的原始意图是它可以随时间变化，该值的版本由Key的时间戳部分唯一标识（假设Key的所有其他部分都是等效的）。通过关闭VersioningIterator，您可以保留Key的值的历史记录。

解决此问题的最常见方法是使用一些序列化数据结构将所有“组名”存储在一个值中。一个简单的方法是CSV“员工，开发人员”，然后您的更新将是“员工，开发人员，经理”。您可以通过Hadoop Writable，Google Protocol Buffers或Apache Thrift（或许多其他工具）等工具获得更高级的功能，以获得更紧凑的表示，更轻松的编程访问和向后兼容性。

Answer 2

可以准确删除行

in001 : group : name -> employee

使用：compact和自定义过滤器，它会从压缩中精确排除此值。（未经测试但应该有效。）使用：

IteratorSetting config = new IteratorSetting(10, "excludeTermFilter", ExcludeTermFilter.class);
config.setTermToExclude("group","name","employee");
List<IteratorSetting> filterList = new ArrayList<IteratorSetting>();
filterList.add(config);
connector.tableOperations().compact(tableName, startRow, endRow, filterList, true, false);

使用相应值和此自定义过滤器（基于GrepIterator）：

public class ExcludeTermFilter extends Filter {    
  private byte termToExclude[];
  private byte columnFamily[];
  private byte columnQualifier[];
  @Override
  public boolean accept(Key k, Value v) {
    return !(match(v.get(),termToExclude) &&
             match(k.getColumnFamilyData(),columnFamily) &&
             match(k.getColumnQualifierData(),columnQualifier) 
            );
  }

  private boolean match(ByteSequence bs, byte[] term) {
    return indexOf(bs.getBackingArray(), bs.offset(), bs.length(), term) >= 0;
  }

  private boolean match(byte[] ba, byte[] term) {
    return indexOf(ba, 0, ba.length, term) >= 0;
  }

  // copied code below from java string and modified    
  private static int indexOf(byte[] source, int sourceOffset, int sourceCount, byte[] target) {
    byte first = target[0];
    int targetCount = target.length;
    int max = sourceOffset + (sourceCount - targetCount);

    for (int i = sourceOffset; i <= max; i++) {
      /* Look for first character. */
      if (source[i] != first) {
        while (++i <= max && source[i] != first)
          continue;
      }

      /* Found first character, now look at the rest of v2 */
      if (i <= max) {
        int j = i + 1;
        int end = j + targetCount - 1;
        for (int k = 1; j < end && source[j] == target[k]; j++, k++)
          continue;

        if (j == end) {
          /* Found whole string. */
          return i - sourceOffset;
        }
      }
    }
    return -1;
  }

  @Override
  public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
    GrepIterator copy = (GrepIterator) super.deepCopy(env);
    copy.termToExclude = Arrays.copyOf(termToExclude, termToExcludelength);
    copy.columnFamily = Arrays.copyOf(columnFamily, termToExcludelength);
    copy.columnQualifier = Arrays.copyOf(columnQualifier, termToExcludelength);
    return copy;
  }

  @Override
  public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options, IteratorEnvironment env) throws IOException {
    super.init(source, options, env);
    termToExclude = options.get("etf.term").getBytes(UTF_8);
    columnFamily = options.get("etf.family").getBytes(UTF_8);
    columnQualifier = options.get("etf.qualifier").getBytes(UTF_8);
  }

  /**
   * Encode the family, qualifier and termToExclude as an option for a ScanIterator
   */
  public static void setTermToExclude(IteratorSetting cfg, String family, String qualifier, String termToExclude) {
    cfg.addOption("etf.family", family);
    cfg.addOption("etf.qualifier", qualifier);
    cfg.addOption("etf.term", termToExclude);
  }
}

Answer 3

或者，您可以使用其他架构

rowId : colFam: colQual -> value

in001 : user : name -> erp 
in001 : user : age -> 23
in001 : group/0 : name -> employee
in001 : group/1 : name -> developer

或者

rowId : colFam: colQual -> value

in001 : user : name -> erp 
in001 : user : age -> 23
in001 : group : 0/name -> employee
in001 : group : 1/name -> developer

这是因为“有很多人”。你为每一个引入一个键的关系（在colFamily或colQualifier中）允许你独立地操作它们。

根据值[Accumulo]从表中删除一行

3 个答案: