项目字段需要根据<{1}}设置在
上的索引进行更改例如
start_urls
所发生的事情是 location = input("Location:")
second_location = input("Second Location:")
start_urls = [
"https://www.yellowpages.com/search?search_terms=" + search_item + "&geo_location_terms=" + location,
"https://www.yellowpages.com/search?search_terms=" + search_item + "&geo_location_terms=" + second_location
# "https://www.yellowpages.com/search?search_terms=" + search_item + "&geo_location_terms=" + third_location,
# "https://www.yellowpages.com/search?search_terms=" + search_item + "&geo_location_terms=" + fourth_location
]
if self.start_urls[0]:
item['location'] = location
if self.start_urls[1]:
item['location'] = second_location
将被修复并且不会动态变化,使得所有项目输出位置都是位置值,尽管它是item['location']
这是我到目前为止所做的。
items.py
self.start_urls[1]
myspider.py
class Item(scrapy.Item):
business_name = scrapy.Field()
website = scrapy.Field()
phonenumber = scrapy.Field()
email = scrapy.Field()
location = scrapy.Field()
# third_location = scrapy.Field()
# fourth_location = scrapy.Field()
visit_id = scrapy.Field()
visit_status = scrapy.Field()
答案 0 :(得分:0)
你的代码毫无意义。
if self.start_urls[0]:
item['location'] = location
if self.start_urls[1]:
item['location'] = second_location
只要start_urls
的元素不是空字符串(或其他虚假值),就会执行这两个块。
如果我正确理解您的问题,您希望item['location']
与起始网址中使用的位置相同。最简单的方法是让您的请求保存此信息。
您应该在start_requests()中制作自定义请求,并使用https://doc.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-request-callback-arguments中描述的方法将位置作为请求元数据传递。
之后,只需将其传递给任何后续请求。