我的项目的目标是在网站上搜索公司电话号码。
我正在尝试解析网页和正则表达式中的电话号码(我正在处理该部分),然后在页面上查找链接。这些链接是我要递归调用的。因此,我将在那些链接上调用该函数并重复执行。 但是,该功能仅运行一次。参见下面的代码:
def parse(self, response):
# The main method of the spider. It scrapes the URL(s) specified in the
# 'start_url' argument above. The content of the scraped URL is passed on
# as the 'response' object.
hxs = HtmlXPathSelector(response)
#print(phone_detail)
print('here')
for phone_num in response.xpath('//body').re(r'\d{3}.\d{3}.\d{4}'):
item = PhoneNumItem()
item['label'] = "a"
item['phone_num'] = phone_num
yield item
for url in hxs.xpath('//a/@href').extract():
# This loops through all the URLs found
# Constructs an absolute URL by combining the responses URL with a possible relative URL:
next_page = response.urljoin(url)
print("Found URL: " + next_page)
#yield response.follow(next_page, self.parse_page)
yield scrapy.Request(next_page, callback=self.parse)
请让我知道您的想法...对我来说,这段代码似乎应该起作用,但事实并非如此。