我安装了Nutch 1.7和Solr 3.6.2并能够搜索和索引xls,doc,pdf& zip文件。现在我想索引像.avi,.mov
这样的视频文件我编辑了regex-urlfilter.txt以删除这些扩展类型,但唯一能够编入索引的文件是.flv文件。我知道这是Tika所说的支持,但我不需要对视频文件进行元数据索引,我只想将文件名编入索引。
我该如何启用?
# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|gz|rpm|tgz|exe|jpeg|JPEG|bmp|BMP)$
<configuration>
<property>
<name>http.agent.name</name>
<value>crawler</value>
</property>
<property>
<name>http.robots.agents</name>
<value>crawler,*</value>
</property>
<property>
<name>http.accept.language</name>
<value>zh-cn, ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3</value>
<description>Value of the “Accept-Language” request header field.
This allows selecting non-English language as default one to retrieve.
It is a useful setting for search engines build for certain national group.
</description>
</property>
<property>
<name>parser.character.encoding.default</name>
<value>utf-8</value>
<description>The character encoding to fall back to when no other information
is available</description>
</property>
<property>
<name>http.content.limit</name>
<value>10000000</value>
<description>The length limit for downloaded content, in bytes.
If this value is nonnegative (>=0), content longer than it will be truncated;
otherwise, no truncation at all.
</description>
</property>
<property>
<name>file.content.limit</name>
<value>10000000</value>
<description>The length limit for downloaded content, in bytes.
If this value is nonnegative (>=0), content longer than it will be truncated; otherwise, no truncation at all.
</description>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(html|tika|metatags|zip)|index-(basic|anchor|metadata)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>
<property>
<name>metatags.names</name>
<value>*</value>
<description> Names of the metatags to extract, separated by;.
Use '*' to extract all metatags. Prefixes the names with 'metatag.'
in the parse-metadata. For instance to index description and keywords,
you need to activate the plugin index-metadata and set the value of the
parameter 'index.parse.md' to 'metatag.description;metatag.keywords'.
</description>
</property>
<property>
<name>index.parse.md</name>
<value>metatag.description,metatag.keywords</value>
<description> Comma-separated list of keys to be taken from the parse metadata to generate fields. Can be used e.g. for 'description' or 'keywords' provided that these values are generated by a parser (see parse-metatags plugin)
</description>
</property>
</configuration>