我需要从这个webpage中提取所有链接,只有在短语“轻松应用”时才会提取。存在(见网站上的橙色)。
我可以使用selenium成功导航到这一点。
根据我的分析,个别工作位于一个名为' resultsCol'的部分的表格中。我想我需要进入此表以确定具有“轻松应用”的链接。
由于某些原因,我的代码是从我认为的整个网站打印250个网页链接。
我只需要具有“轻松应用”功能的链接。在网页上。
到目前为止代码:
#get into high level table
apply1 = driver.find_element_by_id('resultsBody')
#get into sub-table
apply2 = apply1.find_element_by_id('resultsCol')
#look for 'Easily apply'
find_easily_apply = apply2.find_element_by_class_name('iaLabel')
#find only links that have 'Easily apply'
elems = find_easily_apply.find_elements_by_xpath("//a[@href]")
lst= []
for elem in elems:
lst.append(elem.get_attribute("href"))
print (elem.get_attribute("href"))
HTML:
<div class="row result clickcard vjs-highlight" id="pj_317e5ad4b2a3dad6" data-jk="317e5ad4b2a3dad6" data-advn="616306049889393" data-tu="">
<a target="" id="sja1" data-tn-element="jobTitle" class="jobtitle turnstileLink visited" href="https://www.indeed.com/pagead/clk?mo=r&ad=-6NYlbfkN0B3pvgzIkgI8YWH4BDObvj5fJqf9Bp4LC-HGgoIDJkS64QHwWIROQ-F5tpR1sVNiIhZahbAYS-0EEASodsFYBosg3uud7xzcYENuPGkS0nmCSiRtYix8fzY-m7AiCEWJQr0An0Cv5tQpLo9czik4KHcPqgnWU0XxqhYfQUjfVj0vyetH1wQoWvZW754f5axVrOu4skVXeuIfaXsQWBf9mPeJwSF-v2jbZSEiStMDxTcYutg47tmB25mOBYDyp1i8ygbDxiuKTrDkoiccbwXFXPHhn9odEFIF6q01ROPJLZwxAJVW-SYdRcKXU0mmPfrbb8fO4j6xRiTdy584p9MrbVQWDTyCHF5gu76xPbQK8DzuCPKKQu7dUS8wIgf2hPcf3vjFM4eVpUEh4oiAfC7wbNR4dNx7cXxKC_Pt4FNljg3osMqSpZ3wlYG2RB_hsrpiTT3s1TfvOnFD2oxkMeApXlM-8Q0LKBCyzk=&vjs=3&p=1&sk=&fvj=1&tk=1cbpv78ab4o1pcdr&jsa=2811&sal=0&oc=1&sal=0" title="Quantitative Trading Algorithm Developer" rel="noopener nofollow" onmousedown="sjomd('sja1'); clk('sja1');" onclick="setRefineByCookie([]); sjoc('sja1',0); convCtr('SJ')">Quantitative Trading Algorithm <b>Developer</b></a>
<br>
<div class="sjcl">
<span class="company">
Pacific Block Tehcnology Corp.</span>
- <span class="location">Cambridge, MA</span>
</div>
<div class="paddedSummaryExperience">
<table cellpadding="0" cellspacing="0" border="0"><tbody><tr><td class="snip">
<span class="summary">
1, Have a degree in Mathematics, Finance, Physics, Engineer or Computer Science, with good quantitative analysis ability and skills; 2, Have programming...</span>
</td></tr></tbody></table>
<div class="experience">
<span class="experienceHeader">Desired Experience: </span><span class="experienceList">Azure, Google Cloud Platform, C/C++, Docker, Python, AWS</span>
</div>
</div>
<div class="sjCapt">
<div class="iaP">
<span class="iaLabel"> Easily apply</span>
</div>
<div class="result-link-bar-container">
<div class="result-link-bar"><span class=" sponsoredGray ">Sponsored</span> - <span id="tt_set_10" class="tt_set"><a id="sj_317e5ad4b2a3dad6" href="#" class="sl resultLink save-job-link " onclick="changeJobState('317e5ad4b2a3dad6', 'save', 'linkbar', true, ''); return false;" title="Save this job to my.indeed">save job</a></span><div id="editsaved2_317e5ad4b2a3dad6" class="edit_note_content" style="display:none;"></div><script>if (!window['sj_result_317e5ad4b2a3dad6']) {window['sj_result_317e5ad4b2a3dad6'] = {};}window['sj_result_317e5ad4b2a3dad6']['showSource'] = false; window['sj_result_317e5ad4b2a3dad6']['source'] = "Indeed"; window['sj_result_317e5ad4b2a3dad6']['loggedIn'] = false; window['sj_result_317e5ad4b2a3dad6']['showMyJobsLinks'] = false;window['sj_result_317e5ad4b2a3dad6']['undoAction'] = "unsave";window['sj_result_317e5ad4b2a3dad6']['jobKey'] = "317e5ad4b2a3dad6"; window['sj_result_317e5ad4b2a3dad6']['myIndeedAvailable'] = true; window['sj_result_317e5ad4b2a3dad6']['showMoreActionsLink'] = window['sj_result_317e5ad4b2a3dad6']['showMoreActionsLink'] || false; window['sj_result_317e5ad4b2a3dad6']['resultNumber'] = 10; window['sj_result_317e5ad4b2a3dad6']['jobStateChangedToSaved'] = false; window['sj_result_317e5ad4b2a3dad6']['searchState'] = "q=python developer&l=Massachusetts"; window['sj_result_317e5ad4b2a3dad6']['basicPermaLink'] = "https://www.indeed.com"; window['sj_result_317e5ad4b2a3dad6']['saveJobFailed'] = false; window['sj_result_317e5ad4b2a3dad6']['removeJobFailed'] = false; window['sj_result_317e5ad4b2a3dad6']['requestPending'] = false; window['sj_result_317e5ad4b2a3dad6']['notesEnabled'] = false; window['sj_result_317e5ad4b2a3dad6']['currentPage'] = "serp"; window['sj_result_317e5ad4b2a3dad6']['sponsored'] = true;window['sj_result_317e5ad4b2a3dad6']['showSponsor'] = true;window['sj_result_317e5ad4b2a3dad6']['reportJobButtonEnabled'] = false; window['sj_result_317e5ad4b2a3dad6']['showMyJobsHired'] = false; window['sj_result_317e5ad4b2a3dad6']['showSaveForSponsored'] = true; window['sj_result_317e5ad4b2a3dad6']['showJobAge'] = true;</script></div></div>
<div class="tab-container">
<div class="sign-in-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
</div>
</div>
</div>
提前谢谢你。
答案 0 :(得分:1)
试试这个,我相信有更好的方法可以做到这一点,但在我的快速运行中,我能够使用以下代码获得所有轻松应用的作业
from selenium.common.exceptions import NoSuchElementException
result_table = driver.find_element_by_id('resultsCol')
results = result_table.find_elements_by_css_selector('div[class*="result clickcard"]')
easily_apply = []
for result in results:
try:
result.find_element_by_css_selector('div[class="iaP"]')
easily_apply.append(result.find_element_by_css_selector('a[data-tn-element="jobTitle"]').get_attribute('href'))
except NoSuchElementException:
pass
print(easily_apply)
<强>输出强>
['Marketing Analyst', 'Data Scientist', 'Data Research / Entry (Part Time)', 'Business Analyst - Strategic Planning']