Morphline配置文件没有索引avro nexted数据

时间:2016-07-28 02:28:11

标签: indexing solr avro morphline

我在solr中为我的avro数据生成索引。只为在根级别而不是嵌套的数据元素生成索引。 下面是示例模式(不包括所有模式)

My Avro Schema如下所示。

{
  "type" : "record",
  "name" : "abcd",
  "namespace" : "xyz",
  "doc" : "Schema Definition for Low Fare Search Shopping Request/Response Data",
  "fields" : [ {
    "name" : "ShopID",
    "type" : "string"
  }, {
    "name" : "RqSysTimestamp",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "RqTimestamp",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "RsSysTimestamp",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "RsTimestamp",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "Request",
    "type" : {
      "type" : "record",
      "name" : "RequestStruct",
      "fields" : [ {
        "name" : "TransactionID",
        "type" : [ "string", "null" ]
      }, {
        "name" : "AgentSine",
        "type" : [ "string", "null" ]
      }, {
        "name" : "CabinPref",
        "type" : [ {
          "type" : "array",
          "items" : {
            "type" : "record",
            "name" : "CabinStruct",
            "fields" : [ {
              "name" : "Cabin",
              "type" : [ "string", "null" ]
            }, {
              "name" : "PrefLevel",
              "type" : [ "string", "null" ]
            } ]
          }
        }, "null" ]
      }, {
        "name" : "CountryCode",
        "type" : [ "string", "null" ]
      }, 
        "name" : "PassengerStatus",
        "type" : [ "string", "null" ]
      }, {
}

如何在我的morphline配置文件中引用“TransactionID”。我尝试了所有选项,但它不会为嵌套的数据元素生成索引。

以下是我的morphline配置文件的示例。

extractAvroPaths {
          flatten : true
          paths : { 
        ShopID : /ShopID
                RqSysTimestamp : /RqSysTimestamp
                RqTimestamp : /RqTimestamp
                RsSysTimestamp :/RsSysTimestamp
                RsTimestamp : /RsTimestamp
                TransactionID : "/Request/RequestStruct/TransactionID"
                AgentSine : "/Request/RequestStruct/AgentSine"
                Cabin :/Cabin
                PrefLevel :/PrefLevel
                CountryCode :/CountryCode
                FrequentFlyerStatus :/FrequentFlyerStatus

1 个答案:

答案 0 :(得分:0)

toAvro命令需要将java.util.Map作为输入转换为嵌套的Avro记录。所以这是我的解决方案。

morphlines: [
  {
    id: convertJsonToAvro
    importCommands: [ "org.kitesdk.**" ]
    commands: [
      # read the JSON blob
      { readJson: {} }
      
      # java code
      {
              java { 
                    imports : """
                      import com.fasterxml.jackson.databind.JsonNode;
                      import com.fasterxml.jackson.databind.ObjectMapper;
                      import org.kitesdk.morphline.base.Fields;
                      import java.io.IOException;
                      import java.util.Set;
                      import java.util.ArrayList;
                      import java.util.Iterator;
                      import java.util.List;
                      import java.util.Map;
                    """

                    code : """
                      String jsonStr = record.getFirstValue(Fields.ATTACHMENT_BODY).toString();
                      ObjectMapper mapper = new ObjectMapper();
                      Map<String, Object> map = null;
                      try {
                          map = (Map<String, Object>)mapper.readValue(jsonStr, Map.class);
                      } catch (IOException e) {
                          e.printStackTrace();
                      }
                      Set<String> keySet = map.keySet();
                      for (String o : keySet) {
                          record.put(o, map.get(o));
                      }
                      return child.process(record);                   
                    """

              }
      }               
      
      # convert the extracted fields to an avro object
      # described by the schema in this field
      { toAvro {
        schemaFile: /etc/flume/conf/a1/like_user_event_realtime.avsc
      } }
      
      #{ logInfo { format : "loginfo: {}", args : ["@{}"] } }
  
      # serialize the object as avro
      { writeAvroToByteArray: {
        format: containerlessBinary
      } }
  
    ]
  }
]