Question

我在Windows Vista 64位上使用Python.org版本2.7 64位。我有以下Scrapy代码，它应该返回单词＆＃34; GOAL＆＃34;每次一个Span元素的实例＆＃39; title =＆＃34; Goal＆＃34;＆＃39;找到了：

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
from scrapy.cmdline import execute
import re


class MySpider(Spider):
    name = "goal"
    allowed_domains = ["whoscored.com"]
    start_urls = ["http://www.whoscored.com/Players/3859/Fixtures/Wayne-Rooney"]

    def parse(self, response):
        for row in response.selector.xpath('//table[@id="player-fixture"]//tr[td[@class="tournament"]]'):
            list_of_goals = row.xpath('//span[@title="Goal"]')

            if list_of_goals:
                print "GOAL"

execute(['scrapy','crawl','goal'])

但是，它会返回一个＆＃34; GOAL＆＃34;表格中包括＃Way; Rooney＆＃39;匹配历史记录＆＃39;的所有47行。

任何人都可以看到为什么它不仅仅返回在那场比赛中进球得分的实例？

由于

Answer 1

在开头使用双斜杠的表达式'//span[@title="Goal"]'迭代当前文档的所有节点。这是你的意图吗？

如果您只想迭代当前行的后代，请尝试'.//span[@title="Goal"]'，其中dot显式设置//迭代的起点到当前上下文节点，或者只是'descendant::span[@title="Goal"]'来自当前节点。

Scrapy xpath返回表中每一行的结果而不仅仅是选定的结果

1 个答案: