I am trying to get the code to read in the web page using splash for a more complicated site, but I can't even get the code to run for this simple site location. I ran the docker and have the 8050 port mapped to 0.0.0.0 in my settings.py file. Any help would be greatly appreciated. Please provide version you used for any package as I fear this may be an issue.
I have tried numerous error fixes along the way. Changing the versions of Splash, Scrapy, and Twisted. Scrapy only works on Python 3.x with a newer version of Twisted, but Splash says incomparable with Twisted > 16.2. So I tried switching up the versioning some there with no fixes.
import scrapy
import scrapy_splash
class ExampleSpider(scrapy.Spider):
name = "test"
#allowed_domains = ["Monster.com"]
start_urls = [
'http://quotes.toscrape.com/page/1/'
]
def start_requests(self):
for url in self.start_urls:
yield scrapy_splash.SplashRequest(url, self.parse,
args={
'wait': 0.5,
},
endpoint='render.html',
)
def parse(self, response):
for quote in response.css('div.quote'):
print (quote.css('span.text::text').extract())
I should just receive the Quote Texts, ie. this is the same URL from the python documentation
答案 0 :(得分:0)
您的代码没有错。 您的问题是这样的:
我在
settings.py
文件中将8050端口映射为 0.0.0.0
settings.py
中的正确映射应为:
SPLASH_URL = http://localhost:8050
或
SPLASH_URL = http://127.0.0.1:8050