Question

我正在尝试使用BeautifulSoup从HTML中提取作业总数，并使用以下代码，并且无法从字符串中输入文本，底部出现错误：

代码

 page = requests.get(URL)
 soup = BeautifulSoup(page.text, 'html.parser')  #.text
 print(soup.prettify())

 html = soup.prettify("utf-8")

 findJobs = soup.findAll('a', {'class': 'jobtitle turnstileLink', 'title' :True})
 for findJob in findJobs:   
         print (findJob['title'])

HTML代码

<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk? 
mo=r&amp;ad=44-==&amp;vjs=3&amp;p=6&amp;sk=&amp;fvj=1" id="sja6" 
onclick="setRefineByCookie([]); sjoc('sja6',1); convCtr('SJ')" 
onmousedown="sjomd('sja6'); clk('sja6');" rel="noopener nofollow" 
target="_blank" title="Student Mentor">Student Mentor</a>

错误消息

TypeError: list indices must be integers, not str

Answer 1

findJobs = soup.findAll('a', {'class': 'jobtitle turnstileLink', 'title' :True})
for findJob in findJobs:   
    print (findJob['title'])

您需要找到a而不是span。
soup.findAll将返回列表。您应该将其作为列表使用。

更新20180911：

我使用

对其进行测试

from bs4 import BeautifulSoup

text = """
<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk? 
mo=r&amp;ad=44-==&amp;vjs=3&amp;p=6&amp;sk=&amp;fvj=1" id="sja6" 
onclick="setRefineByCookie([]); sjoc('sja6',1); convCtr('SJ')" 
onmousedown="sjomd('sja6'); clk('sja6');" rel="noopener nofollow" 
target="_blank" title="Student Mentor">Student Mentor</a>
"""

# page = requests.get(URL)
soup = BeautifulSoup(text, 'html.parser')  #.text
print(soup.prettify())

html = soup.prettify("utf-8")

findJobs = soup.findAll('a', {'class': 'jobtitle turnstileLink', 'title' :True})
for findJob in findJobs:   
     print (findJob['title']) 
# Output: Student Mentor

它可以正常运行。错过了某个地方吗？

BeautifulSoup提取html关键字（的确）

1 个答案: