Question

我正在使用scrapy从www.tf2items.com/profiles /./ p获取用户列表及其SteamID

目前，我的代码如下所示：

import scrapy

bot_words = [
"bot",
"BOT",
"[tf2mart]"
]

class AccountSpider(scrapy.Spider):
    name = "accounts"
    start_urls = [  
'file:///Users/max/Documents/promotebot/tutorial/tutorial/TF2ITEMS.htm'
    ]

def parse(self, response):
    for tr in response.css("tbody"):
        user = response.css("span a").extract()
        print(user)
        if bot_words not in response.css("span a").extract():
            for href in response.css("span a::attr(href)").extract():
                #yield response.follow("http://www.backpack.tf" + href, self.parse_accounts)
                print("this is a value")

我的最终目标是让这段代码打印出如下内容：

a href =＆＃34; / profiles / 76561198042757507＆＃34;＆gt; Kchypark

这是一个值

a href =＆＃34; / profiles / 76561198049853548＆＃34;＆gt; Agen Kolar

这是一个值

a href =＆＃34; / profiles / 76561198036381323＆＃34;＆gt; Grave Shifter15

这是一个值

使用当前的代码，我甚至可以期待

a href =＆＃34; / profiles / 76561198042757507＆＃34;＆gt; Kchypark

这是一个值

这是一个值

这是一个值

a href =＆＃34; / profiles / 76561198049853548＆＃34;＆gt; Agen Kolar

这是一个值

这是一个值

这是一个值

a href =＆＃34; / profiles / 76561198036381323＆＃34;＆gt; Grave Shifter15

这是一个值

这是一个值

这是一个值

但是，我得到了：

a href =＆＃34; / profiles / 76561198042757507＆＃34;＆gt; Kchypark

a href =＆＃34; / profiles / 76561198049853548＆＃34;＆gt; Agen Kolar

a href =＆＃34; / profiles / 76561198036381323＆＃34;＆gt; Grave Shifter15

这是一个值

这是一个值

这是一个值

我做错了什么？

Answer 1

第一次打印输出href s

列表

user = response.css("span a").extract()
print(user)

您的代码应该是

def parse(self, response):
    for tr in response.css("tbody"):
        for user in response.css("span a"):
            if bot_words not in user:
                print(user.extract())
                href = user.css('::attr(href)').extract()[0]
                print(href)
                #yield response.follow("http://www.backpack.tf" + href, self.parse_accounts)
                print("this is a value")

此外，srapy的最佳做法是使用items而不是原始print函数。

并注意代码重复，例如response.css("span a").extract()

Python / scrapy嵌套for / if循环工作不正确

1 个答案: