这是我的代码,用于扫描用户并输出他们的SteamID及其库存值:
import scrapy
bot_words = [
"bot",
"BOT",
"Bot",
"[tf2mart]"
]
class AccountSpider(scrapy.Spider):
name = "accounts"
start_urls = [
'file:///Users/max/Documents/promotebot/tutorial/tutorial/TF2ITEMS.htm'
]
def linkgen(self):
global steamid
print("Downloading Page...")
yield scrapy.Request("http://www.backpack.tf" + steamid, callback=self.parse_accounts)
print("Page successfully downloaded.")
def parse(self, response):
global steamid
lgen = self.linkgen()
for tr in response.css("tbody"):
for user in response.css("span a"):
if bot_words not in response.css("span a"):
print("Parsed info")
print("User: " + user.extract())
steamid = user.css('::attr(href)').extract()[0]
print("Steam ID: " + steamid)
lgen.next()
def parse_accounts(self, response):
for key in response.css("ul.stats"):
print("Value finding function activted.")
value = response.css("span.refined-value::text").extract()
print(value)
预期输出为:
Parsed info
User: <a href="/profiles/76561198017108***">user</a>
Steam ID: /profiles/76561198017108***
(SOME VALUE)
但是,当前的输出是:
Parsed info
User: <a href="/profiles/76561198017108***">user</a>
Steam ID: /profiles/76561198017108***
Downloading Page...
Parsed info
User: <a href="/profiles/76561198015589***">user</a>
Steam ID: /profiles/76561198015589***
Page successfully downloaded.
2018-06-13 21:42:45 [scrapy.core.scraper] ERROR: Spider error processing <GET file:///Users/max/Documents/promotebot/tutorial/tutorial/TF2ITEMS.htm> (referer: None)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/max/Documents/promotebot/tutorial/tutorial/spiders/accounts_spider.py", line 32, in parse
lgen.next()
StopIteration
尽管多线程(linkgen生成器在解析函数再次激活它时下载请求),该函数仍然可以工作(?)
答案 0 :(得分:0)
我认为你不应该只是致电lgen.next()
,但你应该像yield lgen.next()
那样产生它,因为lgen
只是一个生成器而lgen.next()
只会检索一个scrapy请求,为了scrapy下载它,你必须得到这个请求。