Question

soup = BeautifulSoup(html).findAll('div', 'thread')
  for i in soup:
    print i

我只会接受这部分代码，因为那是我陷入困境的地方。

Soup返回一个列表，我试图使用'.join（）来获得一个文字字符串，但它没有用，因为它需要一个字符串，而不是一个标签。我想这是一种错误。

迭代，它在屏幕上打印所有列表而不用逗号。

但我想要的是在div cass =“thread”

中获取href内容

我尝试过很多像

这样的事情

soup = BeautifulSoup(html).findAll('div', 'thread')
  for i in soup:
    print BeautifulSoup(i)('a')['href']

最后一个代码给我'NoneType'对象不是callabe。

我正在尝试很多组合，但我确实陷入困境，我根本无法工作。许多失败的试用后我不知道该怎么办。这令人沮丧。

Answer 1

应该是类似的东西 divs = BeautifulSoup(html).findAll('div','thread') for div in divs: print div.find('a').attr['href'] // may it be map(a.attrs)['href'], i dont remember now

Answer 2

看一下这个模块/类的文档（http://www.crummy.com/software/BeautifulSoup/documentation.html） - findAll的第二个参数是一个json对象，而不是一个串。你试过这个吗？

BeautifulSoup(html).findAll('div', { 'class': 'thread' })

在Python和BeautifulSoup中迭代

2 个答案: