Question

我需要从以下网页收到电子邮件：http://bari.geometriapulia.net/index.php/albo-lista/userprofile/abbatantuono-giuseppe

为此，我使用以下代码：

from bs4 import BeautifulSoup
import urllib.request
import re

url = "http://bari.geometriapulia.net/index.php/albo-lista/userprofile/abbatantuono-giuseppe"

content = urllib.request.urlopen(url).read()
soup = BeautifulSoup(content, "lxml")

for link in soup.find_all("a", href=re.compile(r"^mailto:")):

    if "@" in str(link.string):            
        print(link.string)

此代码找不到我想要的电子邮件，您可以在个人资料图片下看到这两个电子邮件，但它会找到放在页面底部的电子邮件（不是我感兴趣的）。

为了理解原因，我下载了整个HTML页面，在应该有电子邮件的地方，您可以阅读邮件所在的“...”，并在其下面的行中显示警告：

<td class="fieldCell" id="cbfv_84"><span class="cbMailRepl" id="cbMa92357">...</span><noscript> 
This e-mail address is protected by spam bot, you must activate JavaScript in you browser in order to visualize it
</noscript>
</td>
</tr>
<tr class="sectiontableentry2 cbft_emailaddress" id="cbfr_97">
<td class="titleCell"><label for="cbfv_97" id="cblabcbfv_97">e-mail:</label></td>
<td class="fieldCell" id="cbfv_97"><span class="cbMailRepl" id="cbMa92358">...</span><noscript> 
 This e-mail address is protected by spam bot, you must activate JavaScript in you browser in order to visualize it

所以我检查了我的浏览器是否启用了JavaScript，它已经从您的屏幕截图中看到了： http://prntscr.com/dwgl7w

那么如何在不通过反垃圾邮件僵尸程序系统从HTML代码中“剪掉”邮件的情况下下载页面？这甚至可能吗？

Answer 1

电子邮件地址由JavaScript生成：

requests或urllib无法处理JS代码。使用硒。

Python 3 - 电子邮件在HTML下载页面中显示为“...”

1 个答案: