使用kudu扫描仪过滤kudu中的特定行

时间:2016-12-01 23:12:15

标签: apache-kudu

kudu中的目标表非常庞大。我在scala中有以下内容,我想检查kudu中是否存在该行。这四列是kudu表中的主键,但是当我定义一个上限时,我似乎得到了所有的行。

如何选择kudu中的特定行?在这里,我希望只返回一行。

val table2 : KuduTable = kuduClient.openTable("event-sets")
    val eventColumns: util.List[String] = List(
      OccurrenceSchema.SetId.name,
      OccurrenceSchema.Period.name,
      OccurrenceSchema.Event.name,
      OccurrenceSchema.Date.name).asJava

     val end:PartialRow  = table2.getSchema.newPartialRow()
    end.addInt(OccurrenceSchema.Period.name,1476)
    end.addInt(OccurrenceSchema.SetId.name,82)
    end.addInt(OccurrenceSchema.Event.name,3195167)
    end.addLong(OccurrenceSchema.Date.name,1367922840000L)

    val kuduScanner: KuduScanner = kuduClient.newScannerBuilder(table2)
      .setProjectedColumnNames(eventColumns)
      .lowerBound(end)
      .exclusiveUpperBound((end))
      .build()

    assert(kuduScanner.hasMoreRows)
    while (kuduScanner.hasMoreRows) {
      val resultIterator: RowResultIterator = kuduScanner.nextRows()
      while (resultIterator.hasNext) {
        val result: RowResult = resultIterator.next()
        assert(result != null)
        logger.info(" : SetId Value -- " + result.getInt(OccurrenceSchema.SetId.name))
        logger.info(" : Period Value -- " + result.getInt(OccurrenceSchema.Period.name))
        logger.info(" : Event Value -- " + result.getInt(OccurrenceSchema.Event.name))
        logger.info(" : Date Value -- " + result.getLong(OccurrenceSchema.Date.name)) 
}
}

1 个答案:

答案 0 :(得分:2)

根据我的理解,您正在寻找表中的eaxcly一条记录。 使用扫描仪并定义边界和/或限制对我来说也没有用。相反,我通过定义KuduPredicate来解决问题。 您将在下面找到我的解决方案。

val builder: KuduScannerBuilder = kuduClient.newScannerBuilder(table2)
// define columns, you want to select
builder.setProjectedColumnNames(eventColumns)

// add predicates to select a record by primary key
val pkPeriod: KuduPredicate = KuduPredicate.newComparisonPredicate(OccurrenceSchema.Period.name), KuduPredicate.ComparisonOp.EQUAL, 1476)
builder.addPredicate(pkPeriod)
val pkSetId: KuduPredicate = KuduPredicate.newComparisonPredicate(OccurrenceSchema.SetId.name), KuduPredicate.ComparisonOp.EQUAL, 82)
builder.addPredicate(pkSetId)
val pkEvent: KuduPredicate = KuduPredicate.newComparisonPredicate(OccurrenceSchema.Event.name), KuduPredicate.ComparisonOp.EQUAL, 3195167)
builder.addPredicate(pkEvent)
val pkDate: KuduPredicate = KuduPredicate.newComparisonPredicate(OccurrenceSchema.Date.name), KuduPredicate.ComparisonOp.EQUAL, 1367922840000L)
builder.addPredicate(pkDate)

val kuduScanner: KuduScanner = builder.build()

while (kuduScanner.hasMoreRows) {
  val resultIterator: RowResultIterator = kuduScanner.nextRows()
  while (resultIterator.hasNext) {
    val result: RowResult = resultIterator.next()

    // do whatever you have to do with the selected record
    logger.info(" : SetId Value -- " + result.getInt(OccurrenceSchema.SetId.name))
  }
}

我是Kudu的新手,因此我不确定这个解决方案是否是最有效的解决方案。至少,它返回预期的结果。

我的原始代码是用Java编写和测试的。我已将它手动移植到Scala,但到目前为止我还没有测试过它!