BeautifulSoup 网页抓取 find_all():自定义函数不起作用

时间:2021-04-03 10:19:47

标签: python beautifulsoup

所以我正在从这个站点抓取 MCQ。我最后想要正确的选择。所有选项共享相同的class='radio-button-click-target'。但正确的选项最后有单选按钮点击目标更正问题。我尝试了 BeautifulSoup webscraping find_all( ): finding exact match solutioncustom function,但现在没有出现任何选项。

import requests
from bs4 import BeautifulSoup
address = 'https://www.ilmkidunya.com/online-test/5th-class-science-english-meduim-mcqs-with-answers?startfrom=0&last=92'
response = requests.get(address)
soup = BeautifulSoup(response.text, 'lxml')
ques_id = soup.find_all('div', class_='q-title')
ques_det = soup.find_all('div', class_='q-desc')
optn_det = soup.find_all('div', class_='choose-answer-block')
for i in range(0, len(ques_id)):
    print((ques_id[i].text))
    print(str(ques_det[i].text).strip())
    options = optn_det[i].find_all(lambda tag: tag.name == 'div' and tag.get('class') == ['radio-button-click-target correctquestions'])
    for opn in options:
        print(str(opn.text).strip())
    print('<----->')

当前输出

Question #  1
The group which belong to invertebrates is.
amphibians
Worms
Reptiles
Mammals
<----->
Question #  2
The main cause of cholera is:
land polllution
noise pollution
air pollution
water pollution
<----->

预期输出

Question #  1
The group which belong to invertebrates is.
amphibians
Reptiles
Mammals
Worms
<----->
Question #  2
The main cause of cholera is:
land polllution
noise pollution
air pollution
water pollution
<----->

正确的选项应该显示在最后

1 个答案:

答案 0 :(得分:6)

您需要的一切都在 HTML 中,因此您可以通过获取所有问题、建议答案和正确答案来重建琐事数据库。

方法如下:

import random
import time

import requests
from bs4 import BeautifulSoup

address = 'https://www.ilmkidunya.com/online-test/5th-class-science-english-meduim-mcqs-with-answers?startfrom=0&last=92'
soup = BeautifulSoup(requests.get(address).text, 'lxml')

question = [
    q.getText(strip=True) for q
    in soup.select("div.single-question-answer-block div.q-desc")
]

radio_buttons = [
    o.getText(strip=True) for o in
    soup.select("div.fancy-radio-box .radio-button:disabled + .radio-button-click-target")
]

correct_answers = [
    a.getText(strip=True) for a in
    soup.find_all(lambda t: t.name == "label" and "correctquestions" in t["class"])
]

options = [radio_buttons[i:i + 4] for i in range(0, len(radio_buttons), 4)]

trivia_base = list(zip(question, correct_answers, options))

question, correct_answer, answers = random.choice(trivia_base)
time_to_answer_in_seconds = 15

print(question.title())
print("\n".join(f"-> {a.title()}" for a in answers))
print("-" * len(question))

time.sleep(time_to_answer_in_seconds)
print(f"Correct answer is: {correct_answer}.")

示例输出:

Which Of The Following Objects Emits Light?
-> Earth
-> Sun
-> Moon
-> Pluto
-------------------------------------------
Correct answer is: Sun.

编辑:

如果您想一次打印所有问题,请使用:

trivia_base = list(zip(question, correct_answers, options))
horizontal_line = max(len(q) for q in question)

for number, trivia in enumerate(trivia_base, start=1):
    question, correct_answer, answers = trivia
    print(f"{number}. {question.title()}")
    print("\n".join(f"-> {a.title()}" for a in answers))
    print(f"Correct answer is: {correct_answer}.")
    print("-" * horizontal_line)

输出:

1. Which Of The Following Objects Emits Light?
-> Earth
-> Sun
-> Moon
-> Pluto
Correct answer is: Sun.
----------------------------------------------------------------------------------------------------
2. The Group Which Belongs To Invertebrates Is:
-> Amphibians
-> Insects
-> Reptiles
-> Birds
Correct answer is: insects.
----------------------------------------------------------------------------------------------------
3. Number Of Petals In A Flower Of Dicot Plant May Be:
-> 3
-> 4
-> 6
-> 7
Correct answer is: 4.
----------------------------------------------------------------------------------------------------

and more...