我正在与Beautifulsoup学习抓取,并且正在使用Stackoverflow有趣的问题部分(“ https://stackoverflow.com/?tab=interesting”)进行练习。
我想提取用户已用'java'标记的前5个问题的超链接 至少有一个答案(如果答案已被接受但不是必需的,则可以)。
我看过Beautifulsoup documentation, 但我无法将其整合在一起。
感谢您的帮助!
代码:
from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("https://stackoverflow.com/?tab=interesting")
content = html.read()
soup = BeautifulSoup(content)
soup.findAll('a',{'class':'question-hyperlink'}, href = True , limit=5) # question link
soup.findAll('div', {'class':'status answered'}, limit=5) # question answer
soup.findAll('a',{'class':'post-tag'}, rel ='tag' , text = 'java', limit=5) # question user tag
期望的输出(作为超链接):
https://stackoverflow.com/questions/number/first-question-to-meet-the-criteria
https://stackoverflow.com/questions/number/second-question-to-meet-the-criteria
https://stackoverflow.com/questions/number/third-question-to-meet-the-criteria
https://stackoverflow.com/questions/number/forth-question-to-meet-the-criteria
https://stackoverflow.com/questions/number/fifth-question-to-meet-the-criteria
答案 0 :(得分:0)
尝试一下:
from bs4 import BeautifulSoup
import requests
html = requests.get("https://stackoverflow.com/?tab=interesting")
soup = BeautifulSoup(html.content)
# find and iterate over all parent divs of questions
for elem in soup.findAll('div',{'class':'question-summary narrow'}):
# get count of answers
answer = elem.find("div", {"class": "mini-counts"})
if answer.text != "0":
# check if question is tagged with "Java"
tags = elem.find("div", {"class": "t-java"})
if tags is not None:
# print link
print(elem.find("a")["href"])
如果没有打印输出,请尝试将标签更改为t-python
。