Question

目标是输出课程名称及其成绩的字典：

<tr>
<td class="course"><a href="/courses/1292/grades/5610">Modern Europe &amp; the World - Dewey</a></td>
<td class="percent">
    92%
</td>
<td style="display: none;"><a href="#" title="Send a Message to the Teacher" class="no-hover"><img alt="Email" src="/images/email.png?1395938788" /></a></td>
</tr>

到此：

{Modern Europe &amp; the World - Dewey: 92%, the next couse name: grade...etc}

我知道如何找到百分比标签或只是一个href标签，但我不确定如何获取文本并将其编译成字典，以便它更有用。谢谢！

Answer 1

试试这个：
对于每个tr元素，尝试找到孩子你需要的东西（course和percent班的人）如果两者都有存在，然后构建grades字典

>>> from bs4 import BeautifulSoup
>>> html = """
... <tr>
... <td class="course"><a href="/courses/1292/grades/5610">Modern Europe &amp; the World - Dewey</a></td>
... <td class="percent">
...     92%
... </td>
... <td style="display: none;"><a href="#" title="Send a Message to the Teacher" class="no-hover"><img alt="Email" src="/images/email.png?1395938788" /></a></td>
... </tr>
... """
>>> 
>>> soup = BeautifulSoup(html)
>>> grades  = {}
>>> for tr in soup.find_all('tr'):
...     td_course  = tr.find("td", {"class" : "course"})
...     td_percent = tr.find("td", {"class" : "percent"})
...     if td_course and td_percent:
...         grades[td_course.text.strip()] = td_percent.text.strip()
... 
>>> 
>>> grades
{u'Modern Europe & the World - Dewey': u'92%'}

Answer 2

由于每个tr都包含一系列包含所需信息的td元素，因此您只需使用find_all()将它们收集到列表中，然后提取所需信息：

from bs4 import BeautifulSoup

soup = BeautifulSoup("""
<tr>
<td class="course"><a href="/courses/1292/grades/5610">Modern Europe &amp; the World - Dewey</a></td>
<td class="percent">
    92%
</td>
<td style="display: none;"><a href="#" title="Send a Message to the Teacher" class="no-hover"><img alt="Email" src="/images/email.png?1395938788" /></a></td>
</tr>
""")

grades = {}

for tr in soup.find_all("tr"):
    td_text = [td.text.strip() for td in tr.find_all("td")]
    grades[td_text[0]] = td_text[1]

结果：

>>> grades
{u'Modern Europe & the World - Dewey': u'92%'}

使用美丽的汤从多个文本中提取文本

2 个答案: