如何建立一个独立的Scrapy Spider?

时间:2018-05-15 23:16:05

标签: python scrapy pyinstaller

我很抱歉重新发布,我之前的帖子中的标题令人困惑。在蜘蛛示例(下面的代码)中,我如何使用“pyinstaller”(或其他一些安装程序)来构建可执行文件(例如myspidy.exe),以便最终用户不需要在Windows环境中安装scrapy和python ?安装Python和Scrapy后,通过执行命令“scrapy crawl quotes”来运行spider。最终用户将运行下载并在未预装Python和Scrapy的Windows PC中运行“myspidy.exe”。非常感谢!

import scrapy
class QuotesSpider(scrapy.Spider): 
    name = "quotes"
    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/page/1/',
            'http://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
           f.write(response.body)
        self.log('Saved file %s' % filename)

谢谢EVHZ。我按照您的建议对代码进行了更改,并在运行时遇到了以下错误。

D:\craftyspider\spidy\spidy\spiders\dist>.\runspidy
Traceback (most recent call last):
File "spidy\spiders\runspidy.py", line 35, in <module>
File "site-packages\scrapy\crawler.py", line 249, in __init__
File "site-packages\scrapy\crawler.py", line 137, in __init__
File "site-packages\scrapy\crawler.py", line 326, in _get_spider_loader
File "site-packages\scrapy\utils\misc.py", line 44, in load_object
File "importlib\__init__.py", line 126, in import_module
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'scrapy.spiderloader'
[14128] Failed to execute script runspidy

1 个答案:

答案 0 :(得分:2)

为了将所有内容保存在python文件中,只需执行以下命令:

python script.py

您可以使用您拥有的代码,并添加一些内容:

import scrapy
from scrapy.crawler import CrawlerProcess

from scrapy.utils.project import get_project_settings
# useful if you have settings.py 
settings = get_project_settings()

# Your code
class QuotesSpider(scrapy.Spider): 
  name = "quotes"
  def start_requests(self):
    ...    

# Create a process
process = CrawlerProcess( settings )
process.crawl(QuotesSpider)
process.start()

将其另存为script.py。然后,使用pyinstaller

pyinstaller --onefile script.py

将在名为dist的子目录中生成捆绑包。