Question

我有一个考试网站，其中有2个我需要的课程（主题课程和选择课程）。因此，在一个选择类（实际上是一个选择答案）中，有些问题是图像选择（4个选择是图像），所以，我想在数组中相同位置上刮取4个图像的链接（数组中1个位置是1个问题））。

还有另一个问题，如果没有图片，如何将阵列位置空白？

实际上，我将随机抽取问题的编号以获取4个数组中的数据（主题，选择，主题的图像，选择的图像），然后从这些数组中获取数据以在下一个应用程序中显示。

这是一个示例问题和HTML代码：

<div class="cssQue"><div class="cssExTopic">ข้อที่ 38 :  <ul><li>ภาพใดแสดง เกลียวน๊อตตามแบบมาตรฐาน</li></ul></div><div class="cssExChoice"><ul><li><input type="checkbox" name="a"> 1  : <img src="./drawing_files/S190T1P249_20060124_224403_A1.jpg"></li><li><input type="checkbox" name="a"> 2  : <img src="./drawing_files/S190T1P249_20060124_224403_A2.jpg"></li><li><input type="checkbox" name="a"> 3  : <img src="./drawing_files/S190T1P249_20060124_224403_A3.jpg"></li><li><input type="checkbox" name="a"> 4  : <img src="./drawing_files/S190T1P249_20060124_224403_A4.jpg"></li><li><br></li><li> คำตอบที่ถูกต้อง :<font color="white"> 3</font></li></ul></div></div>

我需要的图像URL应该是：

https://my-project-1471674425170.firebaseapp.com/drawing/drawing_files/S190T1P249_20060124_224403_A1.jpg

Answer 1

如果所有示例都相似，则可以使用以下方法。这还将捕获与每个<li>相关的文本：

from bs4 import BeautifulSoup

html = """
<div class="cssQue">
   <div class="cssExTopic">
      ข้อที่ 38 :  
      <ul>
         <li>ภาพใดแสดง เกลียวน๊อตตามแบบมาตรฐาน</li>
      </ul>
   </div>
   <div class="cssExChoice">
      <ul>
         <li><input type="checkbox" name="a"> 1  : <img src="./drawing_files/S190T1P249_20060124_224403_A1.jpg"></li>
         <li><input type="checkbox" name="a"> 2  : <img src="./drawing_files/S190T1P249_20060124_224403_A2.jpg"></li>
         <li><input type="checkbox" name="a"> 3  : <img src="./drawing_files/S190T1P249_20060124_224403_A3.jpg"></li>
         <li><input type="checkbox" name="a"> 4  : <img src="./drawing_files/S190T1P249_20060124_224403_A4.jpg"></li>
         <li><br></li>
         <li> คำตอบที่ถูกต้อง :<font color="white"> 3</font></li>
      </ul>
   </div>
</div>"""

base_url = "https://my-project-1471674425170.firebaseapp.com/"
soup = BeautifulSoup(html, "html.parser")
div = soup.find('div', class_="cssExChoice")
urls = []

for li in div.find_all('li'):
    img = li.find('img', src=True)

    # Was there an image present?
    if img:
        urls.append((li.get_text(strip=True), base_url + li.img['src'].lstrip('/.')))
    else:
        urls.append((li.get_text(strip=True), None))

# Display the results
for text, url in urls:
    print(f'"{text}" - {url}')

为您提供文本和URL对，如下所示：

"1  :" - https://my-project-1471674425170.firebaseapp.com/drawing_files/S190T1P249_20060124_224403_A1.jpg
"2  :" - https://my-project-1471674425170.firebaseapp.com/drawing_files/S190T1P249_20060124_224403_A2.jpg
"3  :" - https://my-project-1471674425170.firebaseapp.com/drawing_files/S190T1P249_20060124_224403_A3.jpg
"4  :" - https://my-project-1471674425170.firebaseapp.com/drawing_files/S190T1P249_20060124_224403_A4.jpg
"" - None
"คำตอบที่ถูกต้อง :3" - None

在您的示例中，有6个<li>项目，但只有4个具有图像。返回的列表的最后两个条目为None。

也可以按以下方式提取主题：

div_topic = soup.find('div', class_="cssExTopic")
topic = ' - '.join(text.strip() for text in div_topic.strings if text.strip())
print(topic)

给予：

อที่ 38 : - ภาพใดแสดง เกลียวน๊อตตามแบบมาตรฐาน

如果没有图片，如何在同一位置刮取多个图像的链接，使该位置空白

1 个答案: