如果满足某些条件,如何获取所有href链接?

时间:2018-04-23 19:04:28

标签: python html selenium

我需要从这个webpage中提取所有链接,只有在短语“轻松应用”时才会提取。存在(见网站上的橙色)。

我可以使用selenium成功导航到这一点。

根据我的分析,个别工作位于一个名为' resultsCol'的部分的表格中。我想我需要进入此表以确定具有“轻松应用”的链接。

由于某些原因,我的代码是从我认为的整个网站打印250个网页链接。

我只需要具有“轻松应用”功能的链接。在网页上。

到目前为止

代码:

#get into high level table
apply1 = driver.find_element_by_id('resultsBody')
#get into sub-table
apply2 = apply1.find_element_by_id('resultsCol')
#look for 'Easily apply'
find_easily_apply = apply2.find_element_by_class_name('iaLabel')
#find only links that have 'Easily apply'
elems = find_easily_apply.find_elements_by_xpath("//a[@href]")
lst= []
for elem in elems:
    lst.append(elem.get_attribute("href"))
    print (elem.get_attribute("href"))

HTML:

<div class="row result clickcard vjs-highlight" id="pj_317e5ad4b2a3dad6" data-jk="317e5ad4b2a3dad6" data-advn="616306049889393" data-tu="">
            <a target="" id="sja1" data-tn-element="jobTitle" class="jobtitle turnstileLink visited" href="https://www.indeed.com/pagead/clk?mo=r&amp;ad=-6NYlbfkN0B3pvgzIkgI8YWH4BDObvj5fJqf9Bp4LC-HGgoIDJkS64QHwWIROQ-F5tpR1sVNiIhZahbAYS-0EEASodsFYBosg3uud7xzcYENuPGkS0nmCSiRtYix8fzY-m7AiCEWJQr0An0Cv5tQpLo9czik4KHcPqgnWU0XxqhYfQUjfVj0vyetH1wQoWvZW754f5axVrOu4skVXeuIfaXsQWBf9mPeJwSF-v2jbZSEiStMDxTcYutg47tmB25mOBYDyp1i8ygbDxiuKTrDkoiccbwXFXPHhn9odEFIF6q01ROPJLZwxAJVW-SYdRcKXU0mmPfrbb8fO4j6xRiTdy584p9MrbVQWDTyCHF5gu76xPbQK8DzuCPKKQu7dUS8wIgf2hPcf3vjFM4eVpUEh4oiAfC7wbNR4dNx7cXxKC_Pt4FNljg3osMqSpZ3wlYG2RB_hsrpiTT3s1TfvOnFD2oxkMeApXlM-8Q0LKBCyzk=&amp;vjs=3&amp;p=1&amp;sk=&amp;fvj=1&amp;tk=1cbpv78ab4o1pcdr&amp;jsa=2811&amp;sal=0&amp;oc=1&amp;sal=0" title="Quantitative Trading Algorithm Developer" rel="noopener nofollow" onmousedown="sjomd('sja1'); clk('sja1');" onclick="setRefineByCookie([]); sjoc('sja1',0); convCtr('SJ')">Quantitative Trading Algorithm <b>Developer</b></a>

            <br>
            <div class="sjcl">
            <span class="company">
    Pacific Block Tehcnology Corp.</span>

 - <span class="location">Cambridge, MA</span>
            </div>
            <div class="paddedSummaryExperience">
                <table cellpadding="0" cellspacing="0" border="0"><tbody><tr><td class="snip">
                        <span class="summary">
                            1, Have a degree in Mathematics, Finance, Physics, Engineer or Computer Science, with good quantitative analysis ability and skills; 2, Have programming...</span>
                    </td></tr></tbody></table>
                <div class="experience">
                        <span class="experienceHeader">Desired Experience:&nbsp;</span><span class="experienceList">Azure, Google Cloud Platform, C/C++, Docker, Python, AWS</span>
                    </div>
                </div>

                <div class="sjCapt">
                    <div class="iaP">
    <span class="iaLabel"> Easily apply</span>
</div>
<div class="result-link-bar-container">
                            <div class="result-link-bar"><span class=" sponsoredGray ">Sponsored</span> - <span id="tt_set_10" class="tt_set"><a id="sj_317e5ad4b2a3dad6" href="#" class="sl resultLink save-job-link " onclick="changeJobState('317e5ad4b2a3dad6', 'save', 'linkbar', true, ''); return false;" title="Save this job to my.indeed">save job</a></span><div id="editsaved2_317e5ad4b2a3dad6" class="edit_note_content" style="display:none;"></div><script>if (!window['sj_result_317e5ad4b2a3dad6']) {window['sj_result_317e5ad4b2a3dad6'] = {};}window['sj_result_317e5ad4b2a3dad6']['showSource'] = false; window['sj_result_317e5ad4b2a3dad6']['source'] = "Indeed"; window['sj_result_317e5ad4b2a3dad6']['loggedIn'] = false; window['sj_result_317e5ad4b2a3dad6']['showMyJobsLinks'] = false;window['sj_result_317e5ad4b2a3dad6']['undoAction'] = "unsave";window['sj_result_317e5ad4b2a3dad6']['jobKey'] = "317e5ad4b2a3dad6"; window['sj_result_317e5ad4b2a3dad6']['myIndeedAvailable'] = true; window['sj_result_317e5ad4b2a3dad6']['showMoreActionsLink'] = window['sj_result_317e5ad4b2a3dad6']['showMoreActionsLink'] || false; window['sj_result_317e5ad4b2a3dad6']['resultNumber'] = 10; window['sj_result_317e5ad4b2a3dad6']['jobStateChangedToSaved'] = false; window['sj_result_317e5ad4b2a3dad6']['searchState'] = "q=python developer&amp;l=Massachusetts"; window['sj_result_317e5ad4b2a3dad6']['basicPermaLink'] = "https://www.indeed.com"; window['sj_result_317e5ad4b2a3dad6']['saveJobFailed'] = false; window['sj_result_317e5ad4b2a3dad6']['removeJobFailed'] = false; window['sj_result_317e5ad4b2a3dad6']['requestPending'] = false; window['sj_result_317e5ad4b2a3dad6']['notesEnabled'] = false; window['sj_result_317e5ad4b2a3dad6']['currentPage'] = "serp"; window['sj_result_317e5ad4b2a3dad6']['sponsored'] = true;window['sj_result_317e5ad4b2a3dad6']['showSponsor'] = true;window['sj_result_317e5ad4b2a3dad6']['reportJobButtonEnabled'] = false; window['sj_result_317e5ad4b2a3dad6']['showMyJobsHired'] = false; window['sj_result_317e5ad4b2a3dad6']['showSaveForSponsored'] = true; window['sj_result_317e5ad4b2a3dad6']['showJobAge'] = true;</script></div></div>
                        <div class="tab-container">
                            <div class="sign-in-container result-tab"></div>
                            <div class="tellafriend-container result-tab email_job_content"></div>
                        </div>
                    </div>
            </div>

提前谢谢你。

1 个答案:

答案 0 :(得分:1)

试试这个,我相信有更好的方法可以做到这一点,但在我的快速运行中,我能够使用以下代码获得所有轻松应用的作业

from selenium.common.exceptions import NoSuchElementException
result_table = driver.find_element_by_id('resultsCol')
results = result_table.find_elements_by_css_selector('div[class*="result clickcard"]')
easily_apply = []

for result in results:
try:
    result.find_element_by_css_selector('div[class="iaP"]')
    easily_apply.append(result.find_element_by_css_selector('a[data-tn-element="jobTitle"]').get_attribute('href')) 
except NoSuchElementException:
    pass

print(easily_apply)

<强>输出

['Marketing Analyst', 'Data Scientist', 'Data Research / Entry (Part Time)', 'Business Analyst - Strategic Planning']