Question

我正在使用BeautifulSoup从goodreads页面获取书籍的标题。

示例HTML -

<td class="field title"><a href="/book/show/12996.Othello" title="Othello">
  Othello
</a></td>

我想在锚标签之间获取文本。使用下面的代码，我可以使用class =＆＃34; field title＆＃34;以列表形式。

for txt in soup.findAll('td',{'class':"field title"}):
    child = txt.findAll('a')

给出输出 -

[<a href="/book/show/12996.Othello" title="Othello">
  Othello
</a>]
...

如何获得奥赛罗＆＃39;只是部分？这个正则表达式不起作用 -

for ch in child:
    match = re.search(r"([.]*)title=\"<name>\"([.]*)",str(ch))
    print(match.group('name'))

Answer 1

只需打印txt的文字（感谢@angurar澄清OP的要求）：

for txt in soup.findAll('td',{'class':"field title"}):
    print txt.string

或者如果你在<a>的标题属性之后：

for txt in soup.findAll('td',{'class':"field title"}):
    print [a.get('title') for a in txt.findAll('a')]

它将返回所有<a>标题属性的列表。