所以我正在从这个站点抓取 MCQ。我最后想要正确的选择。所有选项共享相同的class='radio-button-click-target'。但正确的选项最后有单选按钮点击目标更正问题。我尝试了 BeautifulSoup webscraping find_all( ): finding exact match solution 个 custom function,但现在没有出现任何选项。
import requests
from bs4 import BeautifulSoup
address = 'https://www.ilmkidunya.com/online-test/5th-class-science-english-meduim-mcqs-with-answers?startfrom=0&last=92'
response = requests.get(address)
soup = BeautifulSoup(response.text, 'lxml')
ques_id = soup.find_all('div', class_='q-title')
ques_det = soup.find_all('div', class_='q-desc')
optn_det = soup.find_all('div', class_='choose-answer-block')
for i in range(0, len(ques_id)):
print((ques_id[i].text))
print(str(ques_det[i].text).strip())
options = optn_det[i].find_all(lambda tag: tag.name == 'div' and tag.get('class') == ['radio-button-click-target correctquestions'])
for opn in options:
print(str(opn.text).strip())
print('<----->')
当前输出
Question # 1
The group which belong to invertebrates is.
amphibians
Worms
Reptiles
Mammals
<----->
Question # 2
The main cause of cholera is:
land polllution
noise pollution
air pollution
water pollution
<----->
预期输出
Question # 1
The group which belong to invertebrates is.
amphibians
Reptiles
Mammals
Worms
<----->
Question # 2
The main cause of cholera is:
land polllution
noise pollution
air pollution
water pollution
<----->
正确的选项应该显示在最后
答案 0 :(得分:6)
您需要的一切都在 HTML
中,因此您可以通过获取所有问题、建议答案和正确答案来重建琐事数据库。
方法如下:
import random
import time
import requests
from bs4 import BeautifulSoup
address = 'https://www.ilmkidunya.com/online-test/5th-class-science-english-meduim-mcqs-with-answers?startfrom=0&last=92'
soup = BeautifulSoup(requests.get(address).text, 'lxml')
question = [
q.getText(strip=True) for q
in soup.select("div.single-question-answer-block div.q-desc")
]
radio_buttons = [
o.getText(strip=True) for o in
soup.select("div.fancy-radio-box .radio-button:disabled + .radio-button-click-target")
]
correct_answers = [
a.getText(strip=True) for a in
soup.find_all(lambda t: t.name == "label" and "correctquestions" in t["class"])
]
options = [radio_buttons[i:i + 4] for i in range(0, len(radio_buttons), 4)]
trivia_base = list(zip(question, correct_answers, options))
question, correct_answer, answers = random.choice(trivia_base)
time_to_answer_in_seconds = 15
print(question.title())
print("\n".join(f"-> {a.title()}" for a in answers))
print("-" * len(question))
time.sleep(time_to_answer_in_seconds)
print(f"Correct answer is: {correct_answer}.")
示例输出:
Which Of The Following Objects Emits Light?
-> Earth
-> Sun
-> Moon
-> Pluto
-------------------------------------------
Correct answer is: Sun.
编辑:
如果您想一次打印所有问题,请使用:
trivia_base = list(zip(question, correct_answers, options))
horizontal_line = max(len(q) for q in question)
for number, trivia in enumerate(trivia_base, start=1):
question, correct_answer, answers = trivia
print(f"{number}. {question.title()}")
print("\n".join(f"-> {a.title()}" for a in answers))
print(f"Correct answer is: {correct_answer}.")
print("-" * horizontal_line)
输出:
1. Which Of The Following Objects Emits Light?
-> Earth
-> Sun
-> Moon
-> Pluto
Correct answer is: Sun.
----------------------------------------------------------------------------------------------------
2. The Group Which Belongs To Invertebrates Is:
-> Amphibians
-> Insects
-> Reptiles
-> Birds
Correct answer is: insects.
----------------------------------------------------------------------------------------------------
3. Number Of Petals In A Flower Of Dicot Plant May Be:
-> 3
-> 4
-> 6
-> 7
Correct answer is: 4.
----------------------------------------------------------------------------------------------------
and more...