Question

我正在构建一个脚本，以使用请求和BeautifulSoup从指定的gyfycat网页下载.mp4文件。我遇到错误，无法访问源标签的'src'属性。我的目标是以下HTML元素：

<source src="https://giant.gfycat.com/PoshDearAsianporcupine.mp4" type="video/mp4">

当我分别用'a'和'href'替换标签和属性时，我的代码有效，所以我不确定为什么无法访问它 'src'属性。代码如下：

import requests
from bs4 import BeautifulSoup

gyfyUrl = 'https://gfycat.com/PoshDearAsianporcupine'

# creating a response object
r = requests.get(gyfyUrl)

# creating beautiful soup object
soup = BeautifulSoup(r.content,'html5lib')

# finding source tags in page
sourceTags = soup.findAll('source')

#printing found tags for clarity
print(sourceTags)

# printing src attribute within source tags - Error
for tag in sourceTags:
   print(tag['src'])

Answer 1

这里的问题是，并非每个source标记都具有src属性，在这种情况下，第一个标记没有。您可以使用如下所示的条件列表理解来收集所有src属性（如果存在）：

srcs = [tag["src"] for tag in sourceTags if "src" in tag.attrs]

结果：

['https://giant.gfycat.com/PoshDearAsianporcupine.webm', 'https://giant.gfycat.com/PoshDearAsianporcupine.mp4', 'https://thumbs.gfycat.com/PoshDearAsianporcupine-mobile.mp4']

无法使用BeautifulSoup访问<source />标记的['src']属性

1 个答案: