Question

我想在网页上搜索某些关键字和关键短语，并将其存在设置为继续解析页面和提取内容字段的条件。有人可以提出解决方案吗？

一般来说，我认为代码看起来像是：

# All the preceding information for the spider
(imports, class declarations, rules etc.).

if response.xpath('//*[contains(/text(), 
"some keyword" or "some key phrase" or "some other keyword")]')
   def parse_items (self, response):

# All the subsequent information for the spider

我想知道我是否走在正确的轨道上，如果是的话，我该怎么办。或者，我会对将关键字的存在设置为使用Scrapy提取数据的先决条件的完全不同的方法感兴趣。

感谢。

Answer 1

我说这些问题应该被关闭，因为它既太广泛又基于意见。不过这里有一个建议：

在Python和scrapy中，我们不做这样的事情：

if something_happens:
    def parse_items (self, response):
        do_something()

相反，我们这样做：

def parse_items (self, response):
    if something_happens:    
        do_something()
    else:
        # Do other things or nothing.
        pass

此外，虽然我认为这个问题不会发布在StackOverflow中，但您仍然可以从Scrapy community的其他部分找到帮助。 IRC频道和邮件列表应该是个好地方提出这样的问题。

将一个或多个关键字的存在设置为从网页中抓取信息的条件

1 个答案: