我在Windows Vista 64位上运行Python.org版本2.7 64位以使用Scrapy。我有一些代码在我通过命令外壳程序运行时有效(除了Command Shell无法识别非Unicode字符的一些问题),但是当我尝试通过Python IDLE运行脚本时,我收到以下错误消息:
Warning (from warnings module):
File "C:\Python27\mrscrap\mrscrap\spiders\test.py", line 24
class MySpider(BaseSpider):
ScrapyDeprecationWarning: __main__.MySpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others)
用于生成此错误的代码是:
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
import re
class MySpider(BaseSpider):
name = "wiki"
allowed_domains = ["wikipedia.org"]
start_urls = ["http://en.wikipedia.org/wiki/Asia"]
def parse(self, response):
titles = response.selector.xpath("normalize-space(//title)")
for titles in titles:
body = response.xpath("//p").extract()
body2 = "".join(body)
print remove_tags(body2)
首先,在Command Shell中正常工作时出现此错误的原因是什么?其次,当我按照错误中的说明并用代码'Spider'替换代码中的两个BaseSpider实例时,代码在Python shell中运行,但什么都不做。没有错误,没有打印到日志,没有错误或警告,没有。
有谁能告诉我为什么这个修订版的代码不会将它的输出打印到Python IDLE?
由于
答案 0 :(得分:1)
将from scrapy.cmdline import execute
添加到您的导入
然后放execute(['scrapy','crawl','wiki'])
并运行你的脚本。
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
import re
from scrapy.cmdline import execute
class MySpider(Spider):
name = "wiki"
allowed_domains = ["wikipedia.org"]
start_urls = ["http://en.wikipedia.org/wiki/Asia"]
def parse(self, response):
titles = response.selector.xpath("normalize-space(//title)")
for title in titles:
body = response.xpath("//p").extract()
body2 = "".join(body)
print remove_tags(body2)
execute(['scrapy','crawl','wiki'])