目标是输出课程名称及其成绩的字典:
<tr>
<td class="course"><a href="/courses/1292/grades/5610">Modern Europe & the World - Dewey</a></td>
<td class="percent">
92%
</td>
<td style="display: none;"><a href="#" title="Send a Message to the Teacher" class="no-hover"><img alt="Email" src="/images/email.png?1395938788" /></a></td>
</tr>
到此:
{Modern Europe & the World - Dewey: 92%, the next couse name: grade...etc}
我知道如何找到百分比标签或只是一个href标签,但我不确定如何获取文本并将其编译成字典,以便它更有用。谢谢!
答案 0 :(得分:1)
试试这个:
对于每个tr
元素,尝试找到孩子你需要的东西(course
和percent
班的人)如果两者都有存在,然后构建grades
字典
>>> from bs4 import BeautifulSoup
>>> html = """
... <tr>
... <td class="course"><a href="/courses/1292/grades/5610">Modern Europe & the World - Dewey</a></td>
... <td class="percent">
... 92%
... </td>
... <td style="display: none;"><a href="#" title="Send a Message to the Teacher" class="no-hover"><img alt="Email" src="/images/email.png?1395938788" /></a></td>
... </tr>
... """
>>>
>>> soup = BeautifulSoup(html)
>>> grades = {}
>>> for tr in soup.find_all('tr'):
... td_course = tr.find("td", {"class" : "course"})
... td_percent = tr.find("td", {"class" : "percent"})
... if td_course and td_percent:
... grades[td_course.text.strip()] = td_percent.text.strip()
...
>>>
>>> grades
{u'Modern Europe & the World - Dewey': u'92%'}
答案 1 :(得分:1)
由于每个tr
都包含一系列包含所需信息的td
元素,因此您只需使用find_all()
将它们收集到列表中,然后提取所需信息:
from bs4 import BeautifulSoup
soup = BeautifulSoup("""
<tr>
<td class="course"><a href="/courses/1292/grades/5610">Modern Europe & the World - Dewey</a></td>
<td class="percent">
92%
</td>
<td style="display: none;"><a href="#" title="Send a Message to the Teacher" class="no-hover"><img alt="Email" src="/images/email.png?1395938788" /></a></td>
</tr>
""")
grades = {}
for tr in soup.find_all("tr"):
td_text = [td.text.strip() for td in tr.find_all("td")]
grades[td_text[0]] = td_text[1]
结果:
>>> grades
{u'Modern Europe & the World - Dewey': u'92%'}