Question

这是我的代码，用于扫描用户并输出他们的SteamID及其库存值：

import scrapy

bot_words = [
"bot",
"BOT",
"Bot",
"[tf2mart]"
]

class AccountSpider(scrapy.Spider):
    name = "accounts"
    start_urls = [
    'file:///Users/max/Documents/promotebot/tutorial/tutorial/TF2ITEMS.htm'
]

def linkgen(self):
    global steamid
    print("Downloading Page...")
    yield scrapy.Request("http://www.backpack.tf" + steamid, callback=self.parse_accounts)
    print("Page successfully downloaded.")

def parse(self, response):
    global steamid
    lgen = self.linkgen()
    for tr in response.css("tbody"):
        for user in response.css("span a"):
            if bot_words not in response.css("span a"):
                print("Parsed info")
                print("User: " + user.extract())
                steamid = user.css('::attr(href)').extract()[0]
                print("Steam ID: " + steamid)
                lgen.next()

def parse_accounts(self, response):
    for key in response.css("ul.stats"):
        print("Value finding function activted.")
        value = response.css("span.refined-value::text").extract()
        print(value)

预期输出为：

Parsed info
User: <a href="/profiles/76561198017108***">user</a>
Steam ID: /profiles/76561198017108***
(SOME VALUE)

但是，当前的输出是：

Parsed info
User: <a href="/profiles/76561198017108***">user</a>
Steam ID: /profiles/76561198017108***
Downloading Page...
Parsed info
User: <a href="/profiles/76561198015589***">user</a>
Steam ID: /profiles/76561198015589***
Page successfully downloaded.
2018-06-13 21:42:45 [scrapy.core.scraper] ERROR: Spider error processing                     <GET file:///Users/max/Documents/promotebot/tutorial/tutorial/TF2ITEMS.htm> (referer: None)
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
  File "/Users/max/Documents/promotebot/tutorial/tutorial/spiders/accounts_spider.py", line 32, in parse
lgen.next()
StopIteration

尽管多线程（linkgen生成器在解析函数再次激活它时下载请求），该函数仍然可以工作（？）

Answer 1

我认为你不应该只是致电lgen.next()，但你应该像yield lgen.next()那样产生它，因为lgen只是一个生成器而lgen.next()只会检索一个scrapy请求，为了scrapy下载它，你必须得到这个请求。

请求完成后，Scrapy迭代器会立即停止

1 个答案: