我试图在我的网站上运行Spider,并在桌面上运行scrapyrt侦听服务器。它告诉我在运行Spider时找不到模块“ webscrape”,并且还给出了“ Int对象没有splitlines属性”。
https://github.com/scrapy/scrapyd/issues/311为scrapyd提供了解决方案。 https://github.com/scrapinghub/scrapyrt/pull/84似乎仍然是一个问题。
所以,我真的很茫然。
错误代码:
2019-08-12 16:37:47-0700 [scrapyrt] Unhandled Error
Traceback (most recent call last):
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http.py", line 2196, in allContentReceived
req.requestReceived(command, path, version)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http.py", line 920, in requestReceived
self.process()
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\server.py", line 199, in process
self.render(resrc)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\server.py", line 259, in render
body = resrc.render(self)
--- <exception caught here> ---
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\resources.py", line 26, in render
result = resource.Resource.render(self, request)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\resource.py", line 250, in render
return m(request)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\resources.py", line 127, in render_GET
return self.prepare_crawl(api_params, scrapy_request_args, **kwargs)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\resources.py", line 217, in prepare_crawl
start_requests=start_requests, *args, **kwargs)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\resources.py", line 226, in run_crawl
dfd = manager.crawl(*args, **kwargs)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\core.py", line 157, in crawl
self.get_project_settings(), self)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\core.py", line 178, in get_project_settings
return get_project_settings(custom_settings=custom_settings)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\conf\spider_settings.py", line 27, in get_project_settings
crawler_settings.setmodule(module, priority='project')
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapy\settings\__init__.py", line 288, in setmodule
module = import_module(module)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
builtins.ModuleNotFoundError: No module named 'webscrape'
Unhandled Error
Traceback (most recent call last):
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http.py", line 2196, in allContentReceived
req.requestReceived(command, path, version)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http.py", line 920, in requestReceived
self.process()
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\server.py", line 199, in process
self.render(resrc)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\server.py", line 259, in render
body = resrc.render(self)
--- <exception caught here> ---
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\resources.py", line 26, in render
result = resource.Resource.render(self, request)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\resource.py", line 250, in render
return m(request)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\resources.py", line 127, in render_GET
return self.prepare_crawl(api_params, scrapy_request_args, **kwargs)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\resources.py", line 217, in prepare_crawl
start_requests=start_requests, *args, **kwargs)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\resources.py", line 226, in run_crawl
dfd = manager.crawl(*args, **kwargs)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\core.py", line 157, in crawl
self.get_project_settings(), self)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\core.py", line 178, in get_project_settings
return get_project_settings(custom_settings=custom_settings)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\conf\spider_settings.py", line 27, in get_project_settings
crawler_settings.setmodule(module, priority='project')
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapy\settings\__init__.py", line 288, in setmodule
module = import_module(module)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
builtins.ModuleNotFoundError: No module named 'webscrape'
2019-08-12 16:37:47-0700 [-] Unhandled Error
Traceback (most recent call last):
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\protocols\basic.py", line 572, in dataReceived
why = self.lineReceived(line)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http.py", line 2105, in lineReceived
self.allContentReceived()
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http.py", line 2196, in allContentReceived
req.requestReceived(command, path, version)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http.py", line 920, in requestReceived
self.process()
--- <exception caught here> ---
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\server.py", line 199, in process
self.render(resrc)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\server.py", line 259, in render
body = resrc.render(self)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\resources.py", line 31, in render
return self.render_object(result, request)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\resources.py", line 95, in render_object
request.setHeader('Content-Length', len(r))
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http.py", line 1271, in setHeader
self.responseHeaders.setRawHeaders(name, [value])
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http_headers.py", line 220, in setRawHeaders
for v in self._encodeValues(values)]
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http_headers.py", line 220, in <listcomp>
for v in self._encodeValues(values)]
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http_headers.py", line 40, in _sanitizeLinearWhitespace
return b' '.join(headerComponent.splitlines())
builtins.AttributeError: 'int' object has no attribute 'splitlines'
Traceback (most recent call last):
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\server.py", line 199, in process
self.render(resrc)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\server.py", line 259, in render
body = resrc.render(self)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\resources.py", line 31, in render
return self.render_object(result, request)
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\scrapyrt\resources.py", line 95, in render_object
request.setHeader('Content-Length', len(r))
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http.py", line 1271, in setHeader
self.responseHeaders.setRawHeaders(name, [value])
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http_headers.py", line 220, in setRawHeaders
for v in self._encodeValues(values)]
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http_headers.py", line 220, in <listcomp>
for v in self._encodeValues(values)]
File "c:\users\user\microblog\job-visualizer\venv\lib\site-packages\twisted\web\http_headers.py", line 40, in _sanitizeLinearWhitespace
return b' '.join(headerComponent.splitlines())
AttributeError: 'int' object has no attribute 'splitlines'
项目布局:
-Job-Visualizer
-app
-webscrape(scrapyrt ran from here in venv)
-spiders
运行Spider时,Spider代码应按预期返回结果。
编辑: 蜘蛛码:
import scrapy
from scrapy_splash import SplashRequest
class IndeedSpider(scrapy.Spider):
name = 'indeedspider'
allowed_domains = ['https://www.indeed.com']
def __init__(self):
super().__init__()
print('Spider being ran...')
self.start_url = 'https://www.indeed.com/jobs?q=financial+aid+advisor&l=Highland%2C+CA'
self.links = []
def modify_realtime_request(self, request):
return SplashRequest(url, self.parse, args=splash_args, endpoint='render.html')
def start_requests(self):
print(self.start_url)
urls = [
self.start_url
]
splash_args = {
'html': 1,
'png': 1,
'width': 800,
'render_all': 1,
}
for url in urls:
yield SplashRequest(url, self.parse, endpoint='render.json', args=splash_args)
def parse(self, response):
html = response.body
title = response.css('title').extract()
titles = response.xpath("//div[@class= 'title']/a/text()").getall()
locations = response.xpath("//div[@class= 'sjcl']/span/text()").getall()
companies = response.css("div.sjcl.span.company a::text").getall()
summarys = response.xpath("//div[@class= 'summary']/text()").getall()
路由部分代码:
params = {
'spider_name': 'indeed_scraper',
'start_requests': True
}
response = requests.get('http://localhost:9080/crawl.json', params)
data = json.loads(response.text)
print(data)
答案 0 :(得分:0)
您是否导入了webscrape模块?另外,您使用的对象类型错误,因此没有分割线属性。如果打印对象类型,它是否显示为int? Splitlines方法仅适用于字符串,因此您需要确保调用它的对象是字符串,而不是int数据类型。
答案 1 :(得分:0)
解决方案: 创建scrapy项目时,请确保scrapy.cfg在SCRAPY项目文件夹之外。
错误:
-app
- webscrape
- scrapy.cfg
- __init__.py
- items.py
- middleware.py
- spiders
- spider.py
正确:
-app
- scrapy.cfg
- webscrape
- __init__.py
- items.py
- middleware.py
- spiders
- spider.py
正确的结果:
{"status": "ok", "items": [], "spider_name": "indeedspider"}