如何使用Selenium插件执行Nutch?

时间:2019-06-13 14:04:39

标签: selenium nutch

我正在尝试运行nutch with selenium plugin,但是由于我是初学者,所以无法了解如何执行Nutch或抓取网站。

根据所需设置对xml进行更改:

<property>
    <name>plugin.includes</name>
    <value>protocol-selenium|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic</value>
    <description>Regular expression naming plugin directory names to
    include.  Any plugin not matching this expression is excluded.
    In any case you need at least include the nutch-extensionpoints plugin. By
    default Nutch includes crawling just HTML and plain text via HTTP,
    and basic indexing and search plugins. In order to use HTTPS please enable 
    protocol-httpclient, but be aware of possible intermittent problems with the 
    underlying commons-httpclient library.
    </description>

我要用硒执行小测试以测试具有javascript的网页吗?

0 个答案:

没有答案