Question

我有一个非常简单的蜘蛛，但是当我尝试搜索要抓取的物品的内容时，却找不到它

要抓取的网址： https://www.filmlinc.org/nyff2019/films/the-irishman/

蜘蛛

import scrapy
from metrograph.items import MetrographItem


class MetrographSpider(scrapy.Spider): #**************Change This*****************
    name = 'metrograph' #**************Change This*****************

    start_urls = ['https://www.filmlinc.org/nyff2019/films/the-irishman/',
        ]

    def parse(self, response):

            title=response.xpath('//div[7]//a[1]//span[1]/text()').getall()
            )
            if "Standby" in item['title']:
                print(item['title'])

            yield item

但是蜘蛛在抓取的字段中找不到“ Standby”一词。但是，如果我手动将其发送到print(item['title'])

我得到以下内容，其中显然有一个词。我删除了一些空白

['\n    \n  
 \n\n   Standby Only\n  ']

我不确定为什么会遇到麻烦？标题项目是列表吗？有没有办法正确搜索它？最终，蜘蛛会在找不到“待机”的情况下寻找场景，但是很明显，如果它对任何可能的建议有所帮助，我显然还无法做到这一点。

谢谢！

Answer 1

您正在尝试在列表中查找一个字符串，以查找整个项目，而您只需要一个字符串即可将该列表转换为str

"Standby" in str(['\n    \n  \n\n   Standby Only\n  '])

我在刮擦的物品中找不到字符串

1 个答案: