Question

我已按照此文档实施dataset_facets()为我的数据集搜索添加了一个方面：

http://docs.ckan.org/en/ckan-2.7.3/extensions/plugin-interfaces.html#ckan.plugins.interfaces.IFacets

更具体地说，我使用以下代码为author字段添加了一个方面：

def dataset_facets(self, facets_dict, package_type):
    if package_type == 'dataset':
        facets_dict['author'] = toolkit._(u'Author')
    return facets_dict

出乎意料的是，facet列表中显示的构面值是标记化和低级的作者名称，而不是全名。即，如果我有这些作者姓名：

[ 'Amt für Statistik', 'Senatsverwaltung für Kultur', 'VBB' ]

然后我得到以下方面值：

[ 'amt', 'fur', 'kultur', 'statistik', 'senatsverwaltung', 'vbb' ]

原因似乎是author字段的Solr架构条目，即type="textgen"。不太了解Solr，我已经尝试过这个并将其更改为type="string"，现在它可以工作，即我将完整的作者名称作为方面值。

我的问题：

为textgen字段选择author为何？
选择string可能会破解CKAN中的其他内容吗？到目前为止，我没有注意到任何问题。
是否有更好的方法可以根据textgen类型字段设置构面（例如，将字段复制到类型为string的新字段中）？

Answer 1

不同之处在于，在将其更改为字符串字段后，搜索相同的字段将需要精确命中。由于不进行任何处理，因此字符串不会被分成单独的部分或小写字母等，因此为了与字段匹配，必须使用完整的搜索字符串Amt für Statistik。仅statistik将不再受欢迎。

我不熟悉CKAN，所以除非它使用该字段进行搜索，否则它应该可以正常工作。但是，如果它也被用于搜索，那么您的下一个建议将是解决它的首选方法。

将字段复制到单独的string字段以用于分面是解决此类问题的首选方法 - 一个用于搜索的字段，一个用于分面的字段。使用不同的定义来获得不同的行为，并选择最适合您正在做的事情的字段。

Answer 2

为了完整起见，这是我基于MatsLindh's answer所做的：

在schema.xml中定义其他作者字段：

<!-- Copy the author field into authorString, and treat as a string
     (rather than textgen). This allows to use author as a facet for search. -->
<field name="author_string" type="string" indexed="true" stored="false" />

将author字段复制到author_string：

<copyField source="author" dest="author_string"/>

使用新字段在CKAN中生成构面：

def dataset_facets(self, facets_dict, package_type):
    if package_type == 'dataset':
        facets_dict['author_string'] = toolkit._(u'Author')
    return facets_dict

现在我可以使用完整字符串的方面，但仍然可以搜索部分字符串。

“作者”字段的CKAN方面给出了标记化的值

2 个答案: