如何使用BeautifulSoup find_all获得跨度?

时间:2018-11-18 07:02:56

标签: python html python-3.x beautifulsoup

我正在尝试从此网站获得以下范围: https://www.indeed.com/jobs?q=data&l=New+York%2C+NY&explvl=entry_level

<span class="indeed-apply-widget indeed-apply-button-container js-IndeedApplyWidget indeed-apply-status-not-applied" aria-labelledby="indeed-apply-button-label" data-indeed-apply-jobtitle="Growth Associate" data-indeed-apply-apitoken="aa102235a5ccb18bd3668c0e14aa3ea7e2503cfac2a7a9bf3d6549899e125af4" data-indeed-apply-coverletter="optional" data-indeed-apply-resume="required" data-indeed-apply-jk="40da42b64688bda8" data-indeed-apply-jobid="19c5d6a1fff8d6ba9724" data-indeed-apply-joblocation="New York, NY" data-indeed-apply-jobcompanyname="Via" data-indeed-apply-joburl="https://www.indeed.com/viewjob?jk=40da42b64688bda8" data-indeed-apply-posturl="https://dradisindeedapply.sandbox.indeed.net/process-indeedapply" data-indeed-apply-jobmeta="{&quot;vtk&quot;:&quot;1csimi0m80g7f002&quot;, &quot;tk&quot;:&quot;&quot;}" data-indeed-apply-advnum="7404493598529036" data-indeed-apply-onapplied="indeedApplyHandleApply" data-indeed-apply-onclose="indeedApplyHandleModalClose" data-indeed-apply-onclick="indeedApplyHandleButtonClick" data-indeed-apply-oncontinueclick="indeedApplyHandleModalClose" data-indeed-apply-pingbackurl="https://gdc.indeed.com/conv/orgIndApp?trk.origin=unknown&amp;jk=40da42b64688bda8&amp;vjtk=1csimi0m80g7f002&amp;advn=7404493598529036&amp;co=US&amp;acct_key=899c31afcc98f5e9&amp;sj=0" data-indeed-apply-skipcontinue="false" data-acc-payload="1,2,22,1,144,1,552,1,3648,1,4392,1" style="padding: 0px !important; margin: 0px !important; text-indent: 0px !important; vertical-align: top !important; position: relative; zoom: 1 !important; display: inline-block;"><a class="indeed-apply-button" href="javascript:void(0);" id="indeed-ia-1542520898760-0"><span class="indeed-apply-button-inner" id="indeed-ia-1542520898760-0inner"><span class="indeed-apply-button-label" id="indeed-ia-1542520898760-0label">Apply Now</span><span class="indeed-apply-button-cm"><img src="https://d3fw5vlhllyvee.cloudfront.net/indeedapply/s/14096d1/check.png" style="border: 0px;"></span></span></a></span>

我尝试了这段代码:

url = "https://www.indeed.com/jobs?q=data&l=New+York%2C+NY&explvl=entry_level"
html = urlopen(url).read().decode('utf-8')
soup = BeautifulSoup(html, features = 'lxml')
soup.find_all("span", {"class":"indeed-apply-widget indeed-apply-button-container js-IndeedApplyWidget indeed-apply-status-not-applied",
                              "aria-labelledby":"indeed-apply-button-label"})

但是结果是[]。

1 个答案:

答案 0 :(得分:0)

您上面提到的URL上没有这样的元素,但是它存在于/viewjob?jk=..页面中。

代码中的class是由javascript生成的,如果您查看页面源代码,则实际类为indeed-apply-widget,并且它只有1个元素

# https://www.indeed.com/viewjob?jk=0ee200c5fc30ce02&from=recjobs&vjtk=1csj1b3nmbi4v800
soup.find("span", {"class":"indeed-apply-widget"})