无法从网页中查找隐藏的电子邮件

时间:2017-05-03 07:59:19

标签: python web-scraping web-crawler

虽然网页没有显示任何电子邮件地址,但运行我的刮刀我可以在控制台上获取它,但它会出现集群文档。有没有办法只保留文件集中的电子邮件和电话号码?这就是我要做的:

import requests
from lxml import html

def Mainpage():
    url = "https://www.houzz.de/professionals/c/Deutschland"
    response = requests.get(url)
    tree = html.fromstring(response.text)
    titles = tree.xpath('//div[@class="name-info"]')
    for title in titles:
        Name=title.xpath('.//a/@href')[0]
        FindindEmail(Name)

def FindindEmail(pagelink):
    response = requests.get(pagelink)
    tree = html.fromstring(response.text)
    titles = tree.xpath('//div[@class="professional-info-content"]/text()')
    for title in titles:
        print(title.strip())

Mainpage()

以下是被抓的内容: enter image description here

1 个答案:

答案 0 :(得分:0)

最终找到解决方案:

import requests
from lxml import html

def Mainpage():
    url = "https://www.houzz.de/professionals/c/Deutschland"
    response = requests.get(url)
    tree = html.fromstring(response.text)
    titles = tree.xpath('//div[@class="name-info"]')
    for title in titles:
        Name=title.xpath('.//a/@href')[0]
        FindingEmail(Name)

def FindingEmail(pagelink):
    response = requests.get(pagelink)
    tree = html.fromstring(response.text)
    titles = tree.xpath('//div[@class="professional-info-content"]/text()')
    for title in titles:
        if "E-Mail:" in title or "Fax:" in title:
            print(title)

Mainpage()