solr层次聚类

时间:2015-10-23 14:10:33

标签: apache solr hierarchical-clustering carrot2

我正在尝试在Apache SOLR中启用hirearchical clustering(子集群生成)。为此,我使用SOLR Clustering Component,将“outputSubclusters”参数设置为true。

但是,当我用JSON显示输出时,我从聚类过程中收到的对象没有显示任何子聚类,这让我想知道......我在这里缺少什么?

这是我在solrconfig.xml中的集群组件:

 <searchComponent name="clustering"
                   enable="${solr.clustering.enabled:false}"
                   class="solr.clustering.ClusteringComponent" >
    <lst name="engine">
      <str name="name">lingo</str>

      <!-- Class name of a clustering algorithm compatible with the Carrot2 framework.

           Currently available open source algorithms are:
           * org.carrot2.clustering.lingo.LingoClusteringAlgorithm
           * org.carrot2.clustering.stc.STCClusteringAlgorithm
           * org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm

           See http://project.carrot2.org/algorithms.html for more information.

           A commercial algorithm Lingo3G (needs to be installed separately) is defined as:
           * com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm
        -->
      <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>

      <!-- Override location of the clustering algorithm's resources 
           (attribute definitions and lexical resources).

           A directory from which to load algorithm-specific stop words,
           stop labels and attribute definition XMLs. 

           For an overview of Carrot2 lexical resources, see:
           http://download.carrot2.org/head/manual/#chapter.lexical-resources

           For an overview of Lingo3G lexical resources, see:
           http://download.carrotsearch.com/lingo3g/manual/#chapter.lexical-resources
       -->
      <str name="carrot.resourcesDir">clustering/carrot2</str>
    </lst>

    <!-- An example definition for the STC clustering algorithm. -->
    <lst name="engine">
      <str name="name">stc</str>
      <str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
    </lst>

    <!-- An example definition for the bisecting kmeans clustering algorithm. -->
    <lst name="engine">
      <str name="name">kmeans</str>
      <str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str>
    </lst>
  </searchComponent>

请求处理程序:

<requestHandler name="/clustering_en" startup="lazy" enable="${solr.clustering.enabled:true}" class="solr.SearchHandler">
  <lst name="defaults">
    <bool name="clustering">true</bool>
    <bool name="clustering.results">true</bool>
    <!-- Field name with the logical "title" of a each document (optional) -->
    <str name="carrot.title">id</str>
    <!-- Field name with the logical "URL" of a each document (optional) 
      <str name="carrot.url">id</str>-->
    <!-- Field name with the logical "content" of a each document (optional) -->
    <str name="carrot.snippet">answer_en</str>
    <!-- Apply highlighter to the title/ content and use this for clustering. -->
    <bool name="carrot.produceSummary">true</bool>
    <!-- the maximum number of labels per cluster -->
    <!--<int name="carrot.numDescriptions">5</int>-->
    <!-- produce sub clusters -->
    <bool name="carrot.outputSubClusters">true</bool>

    <!-- Configure the remaining request handler parameters. -->
    <str name="defType">edismax</str>
    <str name="qf">
      text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
    </str>
    <str name="q.alt">*:*</str>
    <str name="rows">100</str>
    <str name="fl">*,score</str>
  </lst>
  <arr name="last-components">
    <str>clustering</str>
  </arr>
</requestHandler>

我真的很无能为力,我事先感谢你的支持。

1 个答案:

答案 0 :(得分:1)

Carrot2中提供的开源算法(作为Solr的一部分提供)只能生成平面聚类。可以将commercially available clustering algorithm插入