Question

我正在处理一些返回HTML字符串（my_html）的代码。我希望在使用https://doc.scrapy.org/en/latest/topics/debug.html#open-in-browser的浏览器中看到它的外观。为此，我尝试创建一个body设置为“my_html”的响应对象。我尝试了很多东西，包括：

new_response = TextResponse(body=my_html)
open_in_browser(new_response)

基于响应类（https://doc.scrapy.org/en/latest/topics/request-response.html#response-objects）。我得到了：

new_response = TextResponse(body=my_html)
  File "c:\scrapy\http\response\text.py", line 27, in __init__
    super(TextResponse, self).__init__(*args, **kwargs)
TypeError: __init__() takes at least 2 arguments (2 given)

我怎样才能使这个工作？

Answer 1

您的错误似乎与TextResponse初始化有关，according to the docs,您需要使用网址初始化它，TextResponse("http://www.expample.com")应该这样做。

看起来您正在查看Response对象文档并尝试使用TextResponse，就像您Response一样，通过可选参数的外观和文档链接。< / p>

Answer 2

TextResponse expects a URL as first argument：

>>> scrapy.http.TextResponse('http://www.example.com')
<200 http://www.example.com>
>>>

如果你想传递一个正文，你仍然需要一个URL作为第一个参数：

>>> scrapy.http.TextResponse(body='<html><body>Oh yeah!</body></html>')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/paul/.virtualenvs/scrapy12/local/lib/python2.7/site-packages/scrapy/http/response/text.py", line 27, in __init__
    super(TextResponse, self).__init__(*args, **kwargs)
TypeError: __init__() takes at least 2 arguments (2 given)
>>> scrapy.http.TextResponse('http://www.example.com', body='<html><body>Oh yeah!</body></html>')
<200 http://www.example.com>

Scrapy - 如何将html字符串加载到open_in_browser函数中

2 个答案: