虽然网页没有显示任何电子邮件地址,但运行我的刮刀我可以在控制台上获取它,但它会出现集群文档。有没有办法只保留文件集中的电子邮件和电话号码?这就是我要做的:
import requests
from lxml import html
def Mainpage():
url = "https://www.houzz.de/professionals/c/Deutschland"
response = requests.get(url)
tree = html.fromstring(response.text)
titles = tree.xpath('//div[@class="name-info"]')
for title in titles:
Name=title.xpath('.//a/@href')[0]
FindindEmail(Name)
def FindindEmail(pagelink):
response = requests.get(pagelink)
tree = html.fromstring(response.text)
titles = tree.xpath('//div[@class="professional-info-content"]/text()')
for title in titles:
print(title.strip())
Mainpage()
答案 0 :(得分:0)
最终找到解决方案:
import requests
from lxml import html
def Mainpage():
url = "https://www.houzz.de/professionals/c/Deutschland"
response = requests.get(url)
tree = html.fromstring(response.text)
titles = tree.xpath('//div[@class="name-info"]')
for title in titles:
Name=title.xpath('.//a/@href')[0]
FindingEmail(Name)
def FindingEmail(pagelink):
response = requests.get(pagelink)
tree = html.fromstring(response.text)
titles = tree.xpath('//div[@class="professional-info-content"]/text()')
for title in titles:
if "E-Mail:" in title or "Fax:" in title:
print(title)
Mainpage()