如何提取这段文字

时间:2018-05-15 09:13:55

标签: python beautifulsoup

我试图从下面的代码中收到电子邮件。

<div class="col-lg-4" style="border-left:1px solid #d0d0d0;">

    <p>
        <img class="img-responsive" src="/uploads/logos/b75ba9c72de548d665b233d547d92402.jpg" alt="    AJ Navalho">
    </p>
    <h4>    AJ Navalho</h4>
    <p>SEDE/LOJA<br>

    Rua Rómulo de Carvalho, n.º 15
    <br>

    Pendão - 2745-373 Queluz
    <br>

    <br>

    ARMAZÉM
    <br>

    Rua Mário Castelhano, n.º 42
    <br>

    Queluz de Baixo
    <br>

    2745-575 Barcarena
    </p>
    <h3>
        <i class="fa fa-phone"></i>
         21 435 38 67
    </h3>
    <p>
        <i class="fa fa-envelope"></i> 
        ajnavalho@ajnavalho.pt
    </p>
</div>

如何从&#34; fa fa-envelope&#34;中获取电子邮件?类? 我对html不好,所以我不知道#text是什么,如果这意味着什么呢。

2 个答案:

答案 0 :(得分:0)

使用 BeautifulSoup

<强>演示:

from bs4 import BeautifulSoup
s = -->>Your HTML
soup = BeautifulSoup(s, "html.parser")
print(soup.find("i", class_="fa fa-envelope").parent.text.strip())

<强>输出:

ajnavalho@ajnavalho.pt

答案 1 :(得分:0)

这对我有用:

from bs4 import BeautifulSoup
import urllib
r = urllib.urlopen("https://www.oportaldaconstrucao.com/empresa/1964/aj-navalho/").read()
soup = BeautifulSoup(r, 'lxml')
letter = soup.find_all("i", class_="fa fa-envelope")[0].next_sibling
print letter

输出:

ajnavalho@ajnavalho.pt