使用 Beautifulsoup 和 Python 进行网页抓取不起作用

时间:2021-02-06 09:35:13

标签: python html css web-scraping beautifulsoup

我正在尝试从以下页面获取网站地址列表:https://www.wer-zu-wem.de/dienstleister/filmstudios.html

我的代码:

import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.wer-zu-wem.de/dienstleister/filmstudios.html")
src = result.content
soup = BeautifulSoup(src, 'lxml')
links = soup.find_all('a', {'class': 'col-md-4 col-lg-5 col-xl-4 text-center text-lg-right'})
print(links)

import requests
from bs4 import BeautifulSoup

webLinksList = []

result = requests.get(
    "https://www.wer-zu-wem.de/dienstleister/filmstudios.html")
src = result.content
soup = BeautifulSoup(src, 'lxml')


website_Links = soup.find_all(
    'div', class_='col-md-4 col-lg-5 col-xl-4 text-center text-lg-right')


if website_Links != "":
    print("List is empty")
for website_Link in website_Links:
    try:
        realLink = website_Link.find(
            "a", attrs={"class": "btn btn-primary external-link"})
        webLinksList.append(featured_challenge.attrs['href'])
    except:
        continue

for link in webLinksList:
    print(link)

"list is empty" 打印在开头,我没有尝试过将任何数据添加到列表中。

2 个答案:

答案 0 :(得分:2)

试试这个:

import requests
from bs4 import BeautifulSoup

headers = {
    
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0",
}

result = requests.get("https://www.wer-zu-wem.de/dienstleister/filmstudios.html", headers=headers)
src = result.content
soup = BeautifulSoup(src, 'lxml')
links = soup.find('ul', {'class': 'wzwListeFirmen'}).findAll("a")
print(links)

答案 1 :(得分:2)

尝试以下操作以获取指向外部网站的所有链接:

set.seed(111)
bmi <- runif(1000,1,50)
pbfm <- 1.5*bmi + 0.05*bmi^2 +rnorm(1000,0,30)

mod3 <- lm(pbfm ~ bmi + I(bmi^2))
plot(bmi, pbfm,cex=0.3)
o <- order(bmi)
lines(bmi[o], predict(mod3)[o],col="blue")