BeautifulSoup提取html关键字(的确)

时间:2018-09-09 14:02:20

标签: html python-3.x beautifulsoup

我正在尝试使用BeautifulSoup从HTML中提取作业总数,并使用以下代码,并且无法从字符串中输入文本,底部出现错误:

代码

 page = requests.get(URL)
 soup = BeautifulSoup(page.text, 'html.parser')  #.text
 print(soup.prettify())

 html = soup.prettify("utf-8")

 findJobs = soup.findAll('a', {'class': 'jobtitle turnstileLink', 'title' :True})
 for findJob in findJobs:   
         print (findJob['title'])

HTML代码

<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk? 
mo=r&amp;ad=44-==&amp;vjs=3&amp;p=6&amp;sk=&amp;fvj=1" id="sja6" 
onclick="setRefineByCookie([]); sjoc('sja6',1); convCtr('SJ')" 
onmousedown="sjomd('sja6'); clk('sja6');" rel="noopener nofollow" 
target="_blank" title="Student Mentor">Student Mentor</a>

错误消息

TypeError: list indices must be integers, not str

1 个答案:

答案 0 :(得分:0)

findJobs = soup.findAll('a', {'class': 'jobtitle turnstileLink', 'title' :True})
for findJob in findJobs:   
    print (findJob['title'])
  • 您需要找到a而不是span
  • soup.findAll将返回列表。您应该将其作为列表使用。

更新20180911:

我使用

对其进行测试
from bs4 import BeautifulSoup

text = """
<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk? 
mo=r&amp;ad=44-==&amp;vjs=3&amp;p=6&amp;sk=&amp;fvj=1" id="sja6" 
onclick="setRefineByCookie([]); sjoc('sja6',1); convCtr('SJ')" 
onmousedown="sjomd('sja6'); clk('sja6');" rel="noopener nofollow" 
target="_blank" title="Student Mentor">Student Mentor</a>
"""

# page = requests.get(URL)
soup = BeautifulSoup(text, 'html.parser')  #.text
print(soup.prettify())

html = soup.prettify("utf-8")

findJobs = soup.findAll('a', {'class': 'jobtitle turnstileLink', 'title' :True})
for findJob in findJobs:   
     print (findJob['title']) 
# Output: Student Mentor

它可以正常运行。错过了某个地方吗?