我是python和网络抓取的新手。
我正在尝试使用python和beautifulsoup
来从测验站点中获取信息。
我可以分别抓取问题和答案。
from bs4 import BeautifulSoup
import requests
url = 'https://rachacuca.com.br/quiz/18992/bleach-sagas/'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
perguntas = soup.find('ol').li.p.text
#print(perguntas)
respostas = soup.find('ol').find('div', class_='alternativa-texto').p.text
#print(respostas)
todos_elementos = soup.find_all('ol')
#print(todos_elementos)
for elemento in todos_elementos:
perguntas = elemento.find('ol').find('li').p.text
respostas = elemento.find('div', class_='alternativa-texto').p.text
print(f'perguntas: {perguntas}')
print(f'respostas: {respostas}')
print('-'*70)
但是当我收集所有要打印的元素时,会出现此错误:
AttributeError Traceback (most recent call last)
<ipython-input-45-a6fe913d7109> in <module>()
1 for elemento in todos_elementos:
----> 2 perguntas = elemento.find('ol').find('li').p.text
3 respostas = elemento.find('div', class_='alternativa-texto').p.text
4
5 print(f'perguntas: {perguntas}')
AttributeError: 'NoneType' object has no attribute 'find'
答案 0 :(得分:0)
我认为遍历所有答案的过程并不顺利,因为单选按钮和输入框混在一起。当答案结构不同时,请确保处理异常。请尝试以下操作:
from bs4 import BeautifulSoup
import requests
url = 'https://rachacuca.com.br/quiz/18992/bleach-sagas/'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
# Find the top elements
elems = soup.find('ol').find_all('li', recursive=False)
questions = []
for elem in elems:
# Question
print(elem.find('p').text)
question = {'text': elem.find('p').text}
# Answer
try:
print(elem.find('div', class_='alternativa-texto').p.text)
question['answer'] = elem.find('div', class_='alternativa-texto').p.text
except:
print("Error: element has no paragraph. Maybe an input box?")
questions.append(question)
print(questions)