这将导致我的代码失败。
i = 1000
j = 0
dataLen = len(response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]'))
photoNodes = response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]')
for photoNode in photoNodes:
contentHref = photoNode.xpath('.//a/@href').extract_first()
yield Request(contentHref, callback=self.parse_page, priority = i, dont_filter=True)
i -= 1
j += 1
# start parse next page
def parse_page(self, response):
global countLen, dataLen
enName = response.xpath('//*[@class="movie_intro_info_r"]/h3/text()').extract_first()
cnName = response.xpath('//*[@class="movie_intro_info_r"]/h1/text()'
...
我尝试添加if not (photoNode is None):
或if not photoNode ==""
仍然无法正常工作。
i = 1000
j = 0
dataLen = len(response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]'))
photoNodes = response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]')
for photoNode in photoNodes:
if not (photoNode is None):
contentHref = photoNode.xpath('.//a/@href').extract_first()
# photoHref = photoNode.xpath('.//a/img/@src').extract_first()
yield Request(contentHref, callback=self.parse_page, priority = i, dont_filter=True)
i -= 1
j += 1
else:
pass
twRanking['movie'] = movieArray
如果它可能没有href
,我不知道如何跳过它。
任何帮助将不胜感激。预先感谢。
答案 0 :(得分:2)
似乎您需要检查contentHref
是否不为空,而不是photoNode
。 photoNode
仍将包含信息,因此不会为空。尝试这样的事情:
for photoNode in photoNodes:
contentHref = photoNode.xpath('.//a/@href').extract_first()
if contentHref:
# photoHref = photoNode.xpath('.//a/img/@src').extract_first()
yield Request(contentHref, callback=self.parse_page, priority = i, dont_filter=True)
i -= 1
j += 1
else:
pass