尝试抓取网址

时间:2020-05-17 16:48:20

标签: python beautifulsoup screen-scraping

因此,即时通讯试图从免费游戏网站上获取所有网址,但该网址一直返回空白。我不知道我在做什么错,下图显示了路径

result = requests.get("https://steamdb.info/upcoming/free/")
src = result.content
soup = BeautifulSoup(src, 'lxml')

urls = []
for td_tag in soup.find_all('td'):
    a_tag = td_tag.find('a')
    urls.append(a_tag.attrs['href'])

print(urls)

enter image description here

1 个答案:

答案 0 :(得分:4)

您必须使用标头User-Agent,并且标头不能短Mozilla/5.0,但必须是真实网络浏览器中的完整字符串

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0",
}

result = requests.get("https://steamdb.info/upcoming/free/", headers=headers)
soup = BeautifulSoup(result.content, 'lxml')

#print(result.content)
urls = []
for td_tag in soup.find_all('td'):
    a_tag = td_tag.find('a')
    if a_tag:
        urls.append(a_tag.attrs['href'])

print(urls)