检索元标记的问题-Nutch 2.3版本

时间:2019-01-14 14:03:11

标签: meta-tags nutch

我正在使用Nutch2.3-src版本。能够抓取网页,但仅使用描述,而不能使用其他元标记,例如LastModified,Author。

我更新了Index.metadata和metatags.names属性。但是仍然没有运气。仅获取null作为值。

<property>
<name>metatags.names</name>
<value>*</value>
<description>Names of the metatags to extract, separated by ','.
  Use '*' to extract all metatags. Prefixes the names with 'meta_' in
  the parse-metadata. For instance, to index description and keywords,
  you need to activate the plugins parse-metadata and index-metadata
  and set the value of the properties 'metatags.names' and
  'index.metadata' to 'description,keywords'.
  </description>
</property>

<property>
  <name>index.metadata</name>
  <value>description,LastModified,Created,WCMCategories,WCMKeywords,Authors,SiteName,title,lastmodified,created,wcmcategories,wcmkeywords,authors,sitename,meta_description,meta_LastModified,meta_Created,meta_WCMCategories,meta_WCMKeywords,meta_Authors,meta_SiteName,meta_title,meta_lastmodified,meta_created,meta_wcmcategories,meta_wcmkeywords,meta_authors,meta_sitename</value>
  <description>
  Comma-separated list of keys to be taken from the metadata to generate fields.
  Can be used e.g. for 'description' or 'keywords' provided that these values are generated
  by a parser (see parse-metatags plugin), and property 'metatags.names'.
  </description>
</property>

1 个答案:

答案 0 :(得分:0)

解决了此问题。元标记区分大小写。该属性名称应该在网页和nutch-site.xml中都匹配。