所以我的问题是我如何告诉scrapy抓取网址,这只是一个字符串。例如:tuple field accessors 我把字符串保存在txt文件中。
with open("plz_nummer.txt") as f:
cityZIP = f.read().rsplit('\n')
for a in xrange(0,len(cityZIP)):
next_url = 'http://www.firmenfinden.de/?txtPLZ=' + cityZIP[a] + '&txtBranche=&txtKunden='
pass
答案 0 :(得分:0)
我会将带有start_requests
方法的邮政编码部分的文件加载为生成器。有些东西:
import scrapy
class ZipSpider(scrapy.Spider):
name = "zipCodes"
self.city_zip_list = []
def start_requests(self):
with open("plz_nummer.txt") as f:
self.city_zip_list = f.read().rsplit('\n')
for city_zip in self.city_zip_list:
url = 'http://www.firmenfinden.de/?txtPLZ={}&txtBranche=&txtKunden='.format(city_zip)
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
# Anything else you need
# to do in here
pass
这应该给你一个很好的起点。另请阅读本文:https://doc.scrapy.org/en/1.1/intro/tutorial.html