solr索引自定义json - 创建空文档

时间:2018-03-09 13:48:02

标签: indexing solr lucene

我对solr很新。我目前使用docker compose在云模式下运行(我的配置可以在问题的最后看到)

我使用默认配置创建了一个名为audittrail的集合。我的想法是,我将从另一个应用程序发送事件记录信息到solr。默认情况下,它具有方便的动态字段模式。 (我知道我不应该只在生产中使用默认设置,现在我正在寻找概念证明)。

现在我正在关注此文档,试图将我的一些数据编入索引:https://lucene.apache.org/solr/guide/7_2/transforming-and-indexing-custom-json.html#mapping-parameters

> curl 'http://0.0.0.0:8983/api/collections/audittrail/update/json'\
  '?split=/events&'\
  'f=action_kind_s:/action_kind_s&'\
  'f=time_dt:/events/time_dt'\
  '&echo=true' \  ########## NOTE this means we're running in debug more. solr returns the documents it should be creating
  -H 'Content-type:application/json' -d '{
 "action_kind_s": "task_exec", 
 "events": [
     {
         "event_kind_s": "start", 
         "in_transaction_b": false, 
         "time_dt": "2018-03-09T12:57:07Z"
     }, 
     {
         "event_kind_s": "start_txn", 
         "in_transaction_b": true, 
         "time_dt": "2018-03-09T12:57:07Z"
     }, 
     {
         "event_kind_s": "diff", 
         "in_transaction_b": true, 
         "key_s": "('MerchantWorkerProcess', 5819715045818368L)", 
         "property_s": "claim_time", 
         "time_dt": "2018-03-09T12:57:07Z", 
         "value_dt": "2018-03-09T12:57:07Z"
     }, 
 ], 
 "final_status_s": "COMPLETE", 
 "request_s": "1dfda9955dac6f3cfd76fbedee98b15f6edc0db", 
 "task_name_s": "0p5k20100CcnMVxaxoWl32WlfPixjV1OFKgv0k1KZ0m_acc_work"
}'

# response:

{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "docs":[{},
    {},
    {}]}

那是三个空文件......

所以我想也许是因为我没有指定身份证。因此,我为每个活动分配了一个唯一的ID,然后使用添加的&f=id:/events/id再次尝试。相同的结果

最初我尝试使用通配符(&f=/**)具有相同的效果。

我的理解中显然缺少一些东西。

所以我的问题是: 我该怎么做才能正确填充文件?

修改

此外,我的solr节点日志没有出现任何错误。这是一个示例:

2018-03-09 14:30:50.770 INFO  (qtp257895351-21) [c:audittrail s:shard2 r:core_node4 x:audittrail_shard2_replica_n2] o.a.s.u.p.LogUpdateProcessorFactory [audittrail_shard2_replica_n2]  webapp=null path=/update/json params={split=/events}{add=[78953602-6b02-4948-8443-fd1ebc340921 (1594470800573857792)]} 0 3

2018-03-09 14:31:05.770 INFO  (commitScheduler-14-thread-1) [c:audittrail s:shard2 r:core_node4 x:audittrail_shard2_replica_n2] o.a.s.u.DirectUpdateHandler2 start commit{_version_=1594470816305643520,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

2018-03-09 14:31:05.770 INFO  (commitScheduler-14-thread-1) [c:audittrail s:shard2 r:core_node4 x:audittrail_shard2_replica_n2] o.a.s.u.SolrIndexWriter Calling setCommitData with IW:org.apache.solr.update.SolrIndexWriter@13d117d6 commitCommandVersion:1594470816305643520

2018-03-09 14:31:05.918 INFO  (commitScheduler-14-thread-1) [c:audittrail s:shard2 r:core_node4 x:audittrail_shard2_replica_n2] o.a.s.s.SolrIndexSearcher Opening [Searcher@4edc35b0[audittrail_shard2_replica_n2] realtime]

2018-03-09 14:31:05.921 INFO  (commitScheduler-14-thread-1) [c:audittrail s:shard2 r:core_node4 x:audittrail_shard2_replica_n2] o.a.s.u.DirectUpdateHandler2 end_commit_flush

搬运工-compose.yml

version: '3'
services:
  zookeeper:
    image: zookeeper:3.4.11
    ports:
     - "2181:2181"
    hostname: "zookeeper"
    container_name: "zookeeper"
  solr1:
    image: solr:7.2.1
    ports:
      - "8983:8983"
    container_name: solr1 
    links:
      - zookeeper:ZK
    command: /opt/solr/bin/solr start -f -z zookeeper:2181
  solr2:
      image: solr:7.2.1
      ports:
        - "8984:8983"
      container_name: solr2
      links:
        - zookeeper:ZK
      command: /opt/solr/bin/solr start -f -z zookeeper:2181

以下是我为索引某些数据所采用的确切步骤。

这实际上并没有索引任何东西,我想知道为什么

  1. docker-compose up
  2. 创建集合

    curl -X POST 'http://0.0.0.0:8983/solr/admin/collections?action=CREATE&name=audittrail&numShards=2'
    
    {
    "responseHeader":{
    "status":0,
    "QTime":6178},
    "success":{
    "172.24.0.3:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":3993},
      "core":"audittrail_shard1_replica_n1"},
    "172.24.0.4:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":4399},
      "core":"audittrail_shard2_replica_n2"}},
    "warning":"Using _default configset. Data driven schema functionality is enabled by default, which is NOT RECOMMENDED for production use. To turn it off: curl http://{host:port}/solr/audittrail/config -d '{\"set-user-property\": {\"update.autoCreateFields\":\"false\"}}'"}
    
  3. curl创建一些数据(这与主要问题中的卷曲相同。但不是在调试模式下:

    curl 'http://0.0.0.0:8983/api/collections/audittrail/update/json?split=/events&f=action_kind_s:/action_kind_s&f=time_dt:/events/time_dt' -H 'Content-type:application/json' -d '{ "action_kind_s": "task_exec",  "events": [{"event_kind_s": "start","in_transaction_b": false,          "time_dt": "2018-03-09T12:57:07Z"},{"event_kind_s": "start_txn",          "in_transaction_b": true,"time_dt": "2018-03-09T12:57:07Z"},{"event_kind_s": "diff", "in_transaction_b": true,"key_s": "('MerchantWorkerProcess', 5819715045818368L)","property_s": "claim_time","time_dt": "2018-03-09T12:57:07Z","value_dt": "2018-03-09T12:57:07Z"},],  "final_status_s": "COMPLETE",  "request_s": "xxx",  "task_name_s": "xxx"}'
    
    {
    "responseHeader":{
    "status":0,
    "QTime":126}}
    
  4. 执行查询:

    curl 'http://0.0.0.0:8983/solr/audittrail/select?q=*:*'                                                                                                                    
    {
    "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":12,
    "params":{
      "q":"*:*"}},
      "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
     }}
    

1 个答案:

答案 0 :(得分:1)

似乎只有echo参数没有按照您的预期执行操作 - 删除它,并将commit=true添加到您的网址以使Solr提交在返回之前尽快将文档发送到索引,然后您可以在集合 - >查询下的管理界面中搜索*:*,并在其中显示您的字段指数:

{
  "action_kind_s":"task_exec",
  "time_dt":"2018-03-09T12:57:07Z",
  "id":"b56100f5-ff61-45e7-8d6b-8072bac6c952",
  "_version_":1594486636806144000},
{
  "action_kind_s":"task_exec",
  "time_dt":"2018-03-09T12:57:07Z",
  "id":"f49fc3cb-eac6-4d02-bcdf-b7c1a34782e3",
  "_version_":1594486636807192576}