Question

我正在使用Python 3.5进行网络爬虫。使用请求和Beautifulsoup4。我正在尝试获取论坛第一页上所有主题的链接。并将它们添加到列表中。

我有两个问题：

1）不确定如何使用beautifulsoup获取链接，我无法进入链接本身，只是div 2）似乎Beautifulsoup只返回了几个主题，而不是所有主题。

def getTopics():
topics = []
url = 'http://forum.jogos.uol.com.br/pc_f_40'
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'html.parser')

for link in soup.select('[class="topicos"]'):
    a = link.find_all('a href')
    print (a)

getTopics（）

Answer 1

首先，它实际上遍历了页面上显示的所有38个主题。

实际问题在于如何为每个主题提取链接 - for(int i=0; *i<pickMove.length* && !response ; i++)因为页面上没有link.find_all('a href')元素而找不到任何内容。将其替换为a href - 它会找到具有link.select('a[href]')属性的所有a个元素。

好吧，你甚至可以用一个列表理解来解决它：

href

Beautifulsoup4没有返回页面上的所有链接

1 个答案: