如何提高MapReduce中Solr索引构建时间的速度

时间：2015-12-03 23:25:32

标签： solr mapreduce

我写了一个mapreduce作业来为我的数据生成solr索引。我在减速机中做了一代。但是速度真的很慢。有没有办法提高速度？下面列出的代码是reducer中的代码。我的程序有什么问题，或者有没有办法提高生成索引的速度？

private SolrClient solr;
private UpdateResponse response;
private SolrInputDocument document;

@Override
public void reduce(Text inputKey, Iterable<Text> values, Context context) throws IOException, InterruptedException {

    //process the values...
    document = new SolrInputDocument();
    document.addField("id", hid+"#"+refid);
    document.addField();
    .....
    response = solr.add(document);
    solr.commit();
}

public void setup(Context context) {
    if(solrServerMode.equals("Cloud")){
        solr = new CloudSolrClient(solrServerPath);
        ((CloudSolrClient) solr).setDefaultCollection("gettingstarted");
    }
    else if(solrServerMode.equals("Local")){
        solr = new HttpSolrClient(solrServerPath);
    }
}

@Override
public void cleanup(Context context) {
    solr.close();
}

编辑一：有一个可疑部分可能会导致速度非常慢。如图所示，我刚刚更新了46,205个文档，但版本非常高。

1 个答案:

答案 0 :(得分：4)

执行更少或仅一次提交

您在每个文档后执行提交。这是昂贵的并且减慢了索引过程。如果您在索引过程中不需要查看文档进行搜索，我建议重写如下。

@Override
public void reduce(Text inputKey, Iterable<Text> values, Context context) throws IOException, InterruptedException {
    // .....
    response = solr.add(document);
}

@Override
public void cleanup(Context context) {
    solr.commit();
    solr.close();
}

请考虑这将在最后提交。只要这样你就无法通过搜索找到文件。

调整autoCommit设置

另一个起作用的因素是the <autocommit> settings，您可以在solrconfig.xml中进行调整。如果达到未提交的待处理文档的特定阈值或达到未提交的待处理文档的特定阈值时，这些将自动执行提交。增加这些值可以进一步加快索引速度。

<autoCommit>
  <maxDocs>10000</maxDocs>
  <maxTime>1000</maxTime>
  <openSearcher>false</openSearcher>
</autoCommit>