Question

所以我的问题是我如何告诉scrapy抓取网址，这只是一个字符串。例如：tuple field accessors 我把字符串保存在txt文件中。

with open("plz_nummer.txt") as f:
    cityZIP = f.read().rsplit('\n') 

for a in xrange(0,len(cityZIP)):

    next_url = 'http://www.firmenfinden.de/?txtPLZ=' + cityZIP[a] + '&txtBranche=&txtKunden='
        pass

Answer 1

我会将带有start_requests方法的邮政编码部分的文件加载为生成器。有些东西：

import scrapy

class ZipSpider(scrapy.Spider):
    name = "zipCodes"
    self.city_zip_list = []

    def start_requests(self):
        with open("plz_nummer.txt") as f:
            self.city_zip_list = f.read().rsplit('\n')
        for city_zip in self.city_zip_list:
            url = 'http://www.firmenfinden.de/?txtPLZ={}&txtBranche=&txtKunden='.format(city_zip)
            yield scrapy.Request(url=url, callback=self.parse)  

    def parse(self, response):
        # Anything else you need
        # to do in here
        pass

这应该给你一个很好的起点。另请阅读本文：https://doc.scrapy.org/en/1.1/intro/tutorial.html

将字符串导入scrapy以用作爬网URL

1 个答案: