Question

我试图使用Cloudera的教程。（http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/search_hbase_batch_indexer.html）

我有一个代码在HBase中以Avro格式插入对象，我想将它们插入Solr，但我没有得到任何东西。

我一直在看日志：

15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeNotify: {lifecycle=[START_SESSION]}
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeProcess: {_attachment_body=[keyvalues={0Name178721/data:avroUser/1434094131495/Put/vlen=237/seqid=0}], _attachment_mimetype=[application/java-hbase-result]}
15/06/12 00:45:00 DEBUG indexer.Indexer$RowBasedIndexer: Indexer _default_ will send to Solr 0 adds and 0 deletes
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeNotify: {lifecycle=[START_SESSION]}
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeProcess: {_attachment_body=[keyvalues={1Name134339/data:avroUser/1434094131495/Put/vlen=237/seqid=0}], _attachment_mimetype=[application/java-hbase-result]}

所以，我已经知道了，但我不知道为什么它没有在Solr中编入任何索引。我猜我的morphline.conf错了。

morphlines : [
{
    id : morphline1
    importCommands : ["org.kitesdk.**", "org.apache.solr.**", "com.ngdata.**"]
    commands : [
      {
         extractHBaseCells {
          mappings : [
            {
             inputColumn : "data:avroUser"
              outputField : "_attachment_body"
              type : "byte[]"
              source : value
            }
         ]
        }
      }

      #for avro use with type : "byte[]" in extractHBaseCells mapping above
      { readAvroContainer {} }
      {
        extractAvroPaths {
          flatten : true
          paths : {
            name : /name
          }
        }
      }
      { logTrace { format : "output record: {}", args : ["@{}"] } }
    ]
 }
]

我不确定我是否必须拥有一个＆＃34; _attachment_body＆＃34; Solr中的字段，但似乎没有必要，所以我猜readAvroContainer或extractAvroPaths是错误的。我有一个名字＆＃34; Solr中的字段和我的avroUser有一个＆＃34;名称＆＃34;领域也是如此。

{"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}

Answer 1

我在这里做的一切都很好。我做了这个步骤：

1）安装hbase-solr-indexer作为服务：所有你必须安装hbase-solr-indexer的拳头。 installing hbase-solr-indexing as a service

为此添加cloudera repos到yum repos。在那之后：

sudo yum  install hbase-solr-indexer

2）Criate morphline文件：好的，你做到了。

2）为每个列族设置复制范围并注册hbase-indexer配置

Using the Lily HBase NRT Indexer Service

$ hbase shell
hbase shell> disable 'record'
hbase shell> alter 'record', {NAME => 'data', REPLICATION_SCOPE => 1}
hbase shell> enable 'record'

尝试按照上面的其他教程进行操作。 ;）我遇到了NRT解决方案的问题，但是当我逐步完成所有教程时，它就有效了。

我希望这有助于某人。

莉莉与Morphline和HBase

1 个答案: