我正在尝试使用BeautifulSoup从HTML中提取作业总数,并使用以下代码,并且无法从字符串中输入文本,底部出现错误:
代码
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html.parser') #.text
print(soup.prettify())
html = soup.prettify("utf-8")
findJobs = soup.findAll('a', {'class': 'jobtitle turnstileLink', 'title' :True})
for findJob in findJobs:
print (findJob['title'])
HTML代码
<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk?
mo=r&ad=44-==&vjs=3&p=6&sk=&fvj=1" id="sja6"
onclick="setRefineByCookie([]); sjoc('sja6',1); convCtr('SJ')"
onmousedown="sjomd('sja6'); clk('sja6');" rel="noopener nofollow"
target="_blank" title="Student Mentor">Student Mentor</a>
错误消息
TypeError: list indices must be integers, not str
答案 0 :(得分:0)
findJobs = soup.findAll('a', {'class': 'jobtitle turnstileLink', 'title' :True})
for findJob in findJobs:
print (findJob['title'])
a
而不是span
。soup.findAll
将返回列表。您应该将其作为列表使用。更新20180911:
我使用
对其进行测试from bs4 import BeautifulSoup
text = """
<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk?
mo=r&ad=44-==&vjs=3&p=6&sk=&fvj=1" id="sja6"
onclick="setRefineByCookie([]); sjoc('sja6',1); convCtr('SJ')"
onmousedown="sjomd('sja6'); clk('sja6');" rel="noopener nofollow"
target="_blank" title="Student Mentor">Student Mentor</a>
"""
# page = requests.get(URL)
soup = BeautifulSoup(text, 'html.parser') #.text
print(soup.prettify())
html = soup.prettify("utf-8")
findJobs = soup.findAll('a', {'class': 'jobtitle turnstileLink', 'title' :True})
for findJob in findJobs:
print (findJob['title'])
# Output: Student Mentor
它可以正常运行。错过了某个地方吗?