scrapy HTML响应内容不是文本

时间:2018-06-15 14:09:54

标签: python html python-3.x http scrapy

这是我的代码,用于扫描用户并输出他们的SteamID及其库存值:

IF OBJECT_ID('TEST1') IS NOT NULL
    DROP TABLE TEST1

CREATE TABLE [dbo].[TEST1](
    [VALUE_VARCHAR] [varchar](50) NULL,
    [VALUE_DATE] DATE NULL,
) ON [PRIMARY]
GO

INSERT INTO TEST1 (VALUE_VARCHAR) VALUES ('06/12/17')

UPDATE TEST1 SET VALUE_DATE = CAST(VALUE_VARCHAR AS DATE)

SELECT * FROM TEST1

预期输出为:

import scrapy
import logging

bot_words = [
"bot",
"BOT",
"Bot",
"[tf2mart]"
]

class AccountSpider(scrapy.Spider):
    name = "accounts"
    start_urls = [
        'file:///Users/max/Documents/promotebot/tutorial/tutorial/TF2ITEMS.htm'
    ]

    def linkgen(self):
        global steamid
        print("Downloading Page...")
        yield scrapy.Request("http://www.backpack.tf" + steamid, callback=self.parse_accounts)
        print("Page successfully downloaded.")

    def parse(self, response):
        global steamid
        lgen = self.linkgen()
        for tr in response.css("tbody"):
            for user in response.css("span a"):
                if bot_words not in response.css("span a"):
                    print("Parsed info")
                    print("User: " + user.extract())
                    steamid = user.css('::attr(href)').extract()[0]
                    print("Steam ID: " + steamid)
                    yield lgen.next()

    def parse_accounts(self, response):
        print("Value finding function activted.")
        #print(response.headers, response.body)
        print(response.css("head"))
        for description in response.css("head"):
            print("level 1 value")
            value = response.css("description.content").extract()
            print(value)

当前输出为:

Parsed info
User: <a href="/profiles/76561198017108***">user</a>
Steam ID: /profiles/76561198017108***
(SOME VALUE)

尽管多线程(linkgen生成器在解析函数再次激活时下载请求),该函数应该仍然有效(?)我似乎无法将HTTP响应转换为文本对象。

0 个答案:

没有答案