另一周,我发布了一个关于在Drupal中从solr搜索中删除标点符号的问题。那是使用Solr 4.然而,从那时起我正在进行的开发已经从solr 4变为solr 5,现在我遇到了同样的问题,但Can't remove punctuation in Solr处的修复不再有效。由于许多内容标题都有引号,因此在按标题排序时会出现问题。
<field name="label" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
/>
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="0"
preserveOriginal="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
我已尝试添加以下规则,但撇号和引号会顽固地停留在那里,并在按标题排序时会产生干扰,并在列表的开头放置引号。
<charFilter class="solr.HTMLStripCharFilterFactory" />
<filter class="solr.ApostropheFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="^\p{Punct}*(.*?)\p{Punct}*$"
replacement="$1"/>
答案 0 :(得分:0)
不幸的是,我尝试的所有Solr解决方案都无效,所以我从Drupal方面解决了它,结果证明它更简单。下面的代码替换所有特殊字符和数字,将字符串转换为小写,然后将其添加到solr文档。第二个函数将其添加到可用的排序方法中。
function my_module_apachesolr_index_document_build(ApacheSolrDocument $document, $entity, $entity_type, $env_id) {
# to keep letters only
$title = trim($entity->title);
$title = str_replace(' ', '_', $title);
$title = preg_replace('/[^a-z]+/i', '', $title);
$title = strtolower($title);
$document->addField('ss_new_sort',$title);
}
function my_module_apachesolr_query_prepare(DrupalSolrQueryInterface $query) {
$query->setAvailableSort('ss_new_sort', array('title' => t('Title'), 'default' => 'asc'));
}