Question

我在使用Sitecore索引一般索引＆＃34; sitecore_master_index＆＃34;，＆＃34; sitecore_web_index＆＃34;时遇到问题，因为抓取工具/索引器会检查数据库中的所有项目。

我导入了数千种具有大量规格的产品，并且产品库中有数十万件物品。

如果我可以从索引中排除路径，那么就不必检查一百万个项目以进行模板排除。

后续

我实现了一个自定义搜寻器，它排除了被编入索引的路径列表：

<index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SwitchOnRebuildSolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
  <param desc="name">$(id)</param>
  <param desc="core">sitecore_web_index</param>
  <param desc="rebuildcore">sitecore_web_index_sec</param>
  <param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
  <configuration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration" />
  <strategies hint="list:AddStrategy">
    <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
  </strategies>
  <locations hint="list:AddCrawler">
    <crawler type="Sitecore.ContentSearch.Utilities.Crawler.ExcludePathsItemCrawler, Sitecore.ContentSearch.Utilities">
      <Database>web</Database>
      <Root>/sitecore</Root>
      <ExcludeItemsList hint="list">
        <ProductRepository>/sitecore/content/Product Repository</ProductRepository>
      </ExcludeItemsList>
    </crawler>
  </locations>
</index>

此外，我激活了SwitchOnSolrRebuildIndex，因为它具有令人敬畏的ootb功能，为SC欢呼。

using System.Collections.Generic;
using System.Linq;
using Sitecore.ContentSearch;
using Sitecore.Diagnostics;

namespace Sitecore.ContentSearch.Utilities.Crawler
{
  public class ExcludePathsItemCrawler : SitecoreItemCrawler
  {
    private readonly List<string> excludeItemsList = new List<string>();
    public List<string> ExcludeItemsList
    {
      get
      {
        return excludeItemsList;
      }
    }

    protected override bool IsExcludedFromIndex(SitecoreIndexableItem indexable, bool checkLocation = false)
    {
      Assert.ArgumentNotNull(indexable, "item");
      if (ExcludeItemsList.Any(path => indexable.AbsolutePath.StartsWith(path)))
      {
        return true;
      }
      return base.IsExcludedFromIndex(indexable, checkLocation);
    }
  }
}

Answer 1

您可以覆盖要更改的索引使用的SitecoreItemCrawler类：

<locations hint="list:AddCrawler">
  <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
    <Database>master</Database>
    <Root>/sitecore</Root>
  </crawler>
</locations>

然后，您可以添加自己的参数，例如ExcludeTree甚至是ExcludedBranches的列表。

并且在类的实现中只是覆盖方法

public override bool IsExcludedFromIndex(IIndexable indexable)

并检查它是否在排除的节点下。

Answer 2

导入大量数据时，您应该尝试暂时禁用数据索引，否则您将遇到无法跟上的爬虫问题。

这里有一篇关于在导入数据时禁用索引的好文章 - 它适用于Lucene，但我确信你可以对Solr做同样的事情，

http://intothecore.cassidy.dk/2010/09/disabling-lucene-indexes.html

另一种选择可能是将您的产品存储在单独的Sitecore数据库中，而不是存储在主数据库中。

进入核心的另一篇文章：

http://intothecore.cassidy.dk/2009/05/working-with-multiple-content-databases.html

Sitecore 8 XP ContentSearch：从索引

2 个答案: