Python-Selenium从网页上获取电子邮件

时间:2017-11-06 04:44:50

标签: python selenium

我希望从this页面中提取姓名,电话号码和电子邮件地址。

代码有效,但问题是其中一些名称在其“卡片”中有多个链接,因此当我进行提取时,它会抛出所引发链接的整个年表...例如:

   Julio(7月)Anopol ,,,手机:416-678-2916,mailto:julio.luis.anopol@freedom55financial.com

     

Henry D. Arauag,办公室:905-276-1177,分机594,手机:647-649-7955,mailto:henry.arauag@freedom55financial.com

     

Rick Auckbaraullee,办公室:905-276-1177,分机557,手机:416-577-2377,mailto:rick.auckbaraullee@freedom55financial.com

     

Frank Basile,办公室:905-276-1177,分机469,手机:416-797-9316,mailto:frank.basile@freedom55financial.com

     

Janis Bellman,办公室:905-276-1177,分机601,手机:416-258-0630,https://www.linkedin.com/in/janisbellman

     

Sean Beneteau,办公室:905-363-5800,分机123 ,, https://www.facebook.com/MyBellman/

     

Carmen Briguglio,办公室:905-824-5660 ,,, https://twitter.com/BellmanJanis

     

Qi Jun(Steve)Cai,办公室:905-276-1177,分机591,手机:416-949-1069,mailto:janis.bellman@freedom55financial.com

正如您所看到的那样,如果该名称的“卡片”附加了另一个链接,则该序列将被删除

这是我的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

# example option: add 'incognito' command line arg to options
option = webdriver.ChromeOptions()
option.add_argument("--incognito")

# create new instance of chrome in incognito mode
browser = webdriver.Chrome(executable_path='/Library/Application    Support/Google/chromedriver', chrome_options=option)

# go to website
browser.get("https://www.freedom55financial.com/ff/advisor/Ontario/Mississauga")

browser.implicitly_wait(4)

# extract names from parent element
all_names = browser.find_elements_by_xpath('//*[@id="advisor-results"]/article[*]/section/h2')

# extract all phone numbers
all_off_phones_numbers = browser.find_elements_by_xpath('//*[@id="advisor-results"]/article[*]/section/p[4]/a')

#extract all exts
all_exts = browser.find_elements_by_xpath('//*[@id="advisor-results"]/article[*]/section/p[5]')

#extract all cell numbers
all_cell_numbers = browser.find_elements_by_xpath('//*[@id="advisor-results"]/article[*]/section/p[6]/a')

#extract all email addys
all_emails = browser.find_elements_by_xpath('//*[@id="advisor-results"]/article[*]/footer/a[*]')

# print out all info
num_page_items = len(all_names)
for i in range(num_page_items):
    print(all_names[i].text + " , " + all_off_phones_numbers[i].text + " , " + all_exts[i].text + " , " + all_cell_numbers[i].text + " , " + all_emails[i].get_attribute('href'))
    # print(all_names[i].text + " , " + all_off_phones_numbers[i].text + " , " + all_exts[i].text + " , " + all_cell_numbers[i].text + " , " + all_emails[i].text)

browser.close()

显示信息如何包含的页面中HTML CODE的一个示例:

<section id="advisor-results" class="advisor-results" role="region" aria-live="polite" >

<article class="advisor-results__advisor-card f55f English Portugese CIM Male Photo_Yes" aria-describedby="
f55-security-advisor-legend

">
<div class="advisor-image">
<img src="
/dsms/wcm/connect/ff/0a7d8558-3a70-4c24-8d10-
ff779a87d2e2/Amaral_Marcos_2016.jpg?MOD=AJPERES&amp;CACHEID=0a7d8558-3a70-4c24-8d10-ff779a87d2e2

" alt="" />                 </div>
<section class="advisor-details">

<h2>Marcus (Marcos) Amaral</h2>
<p class="advisor-credentials">CIM</p>
<p class="advisor-firm">


Freedom&nbsp;55 Financial


</p>
<p class="advisor-offerings"></p>

<a href="http://maps.google.com/?q=1 City Centre Dr., Mississauga, ON, , L5B 1M2" data-card='map' target="_blank">
<address role="presentation">
<span class="address-line">1 City Centre Dr.</span>
<span class="address-line">Suite 1600</span>
<span class="address-line">Mississauga, ON</span>
<span class="address-line">L5B 1M2</span>
</address>
</a>
<p><a href="tel:905-276-1177" data-card="phone" >Office: 905-276-1177</a></p>
<p>Ext. 485</p>
<p><a href="tel:519-819-3241" data-card="phone" >Mobile: 519-819-3241</a></p>
</section>
<footer>


<a title='' data-card='email' target='' 
href='mailto:marcos.amaral@freedom55financial.com' ><i class='fa fa-
envelope-o' aria-label='Contact this advisor by email'></i></a>
</footer>
]</article>

我尝试了各种变体,通过css选择器查​​找,xpath包含文本等,但无济于事。

如何才能收到电子邮件?

提前致谢。

0 个答案:

没有答案
相关问题