以下是我的代码:
from bs4 import BeautifulSoup
url = "https://www.seek.co.nz/jobs/in-new-zealand/#dateRange=999&workType=0&industry=&occupation=&graduateSearch=false&salaryFrom=0&salaryTo=999999&salaryType=annual&companyID=&advertiserID=&advertiserGroup=&keywords=&page=3&displaySuburb=&seoSuburb=&where=All+New+Zealand&whereId=3001&whereIsDirty=false&isAreaUnspecified=false&location=3001&area=&nation=3001&sortMode=ListedDate&searchFrom=quick&searchType="
response = urllib2.urlopen(url)
html = response.read()
soup = BeautifulSoup(html, "lxml")
#print soup.prettify()
job_title = soup("a", {"class": "job-title"})
print job_title
我想从网站上获取所有职位名称。
我运行代码但结果是blank []
。我尝试了find_all()
的所有用法,但都没有用。
我确信该网站包含了我需要的信息。
答案 0 :(得分:0)
尝试打印html以查看是否有任何带有call_title调用的标签。我试过这样做,但没有找到任何。正如Martijn Pieters的评论中所建议的那样,浏览器开发者工具也显示了由javascript动态创建的DOM。
答案 1 :(得分:0)
试试这个:
import sys
from PyQt5.QtCore import QUrl
from PyQt5.QtWidgets import QApplication
from PyQt5.QtWebKitWidgets import QWebPage
from bs4 import BeautifulSoup
class Render(QWebPage):
app = QApplication(sys.argv)
def __init__(self, url):
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
url = 'https://www.seek.co.nz/jobs/in-new-zealand/#dateRange=999&workType=0&industry=&occupation=&graduateSearch=false&salaryFrom=0&salaryTo=999999&salaryType=annual&companyID=&advertiserID=&advertiserGroup=&keywords=&page=3&displaySuburb=&seoSuburb=&where=All+New+Zealand&whereId=3001&whereIsDirty=false&isAreaUnspecified=false&location=3001&area=&nation=3001&sortMode=ListedDate&searchFrom=quick&searchType='
r = Render(url)
html = r.frame.toHtml()
soup = BeautifulSoup(html, "lxml")
job_title = soup.find("a", {"class": "job-title"})
print(job_title)
Out put:
<a class="job-title" data-bind="storeJobInformation: { currentPage: $root.pagination.currentPage, jobsCount: $root.jobs.jobs().length },
html: name,
attr: {
target: !$root.onsiteSearch() ? '_self' : '_blank',
href: SEEK.searchResultsPage.jobDetailsActionUrl + '/' + id + '?pos=' + position + '&type=' + adType() + '&engineConfig=' + $root.jobs.engineConfig() + '&userqueryid=' + $root.jobs.userQueryId() + '&tier=' + (locationMatch === 'Exact' ? 'tier1' : (locationMatch === 'Nearby' ? 'tier2' : (locationMatch === 'Area' ? 'tier3' : 'no_tier'))) + '&whereid=' + ($root.jobs.location().whereId || '')
},
click: $root.jobs.handleJoraAdClick" href="/job/32120592?pos=1&type=promoted&engineConfig=&userqueryid=123949496807341226&tier=no_tier&whereid=3001" target="_self">Trade Assistant</a>