“ HTTP错误500:内部服务器错误”-Web抓取

时间:2019-11-18 19:35:27

标签: python web-scraping

使用QHarr的答案解决了!

尝试从网站中提取一些信息(以标题开头)。 以下代码可与http://google.com配合使用,但不能与我需要的链接(url)配合使用。

错误代码:"HTTP Error 500: Internal Server Error"

我做错什么了吗?可以用另一种方式吗?

from urllib.request import urlopen
import urllib.error
import bs4
import time

url = "http://st.atb.no/New/minskjerm/FST.aspx?visMode=1&cTit=&c1=1&s1=16011301&sv1=&cn1=&template=2&cmhb=FF6600&cmhc=00FF00&cshb=3366FF&cshc=FFFFFF&arb=000000&rows=1&period=&" 


for i in range(5): #Try 5 times to reach page
    try: 
     html = urlopen(url)
    except urllib.error.HTTPError as exc:
        print('Error code: ', exc)
        time.sleep(1) # wait 10 seconds and then make http request again
        continue
    else:
        print('Success')
        break


soup = bs4.BeautifulSoup(html, 'lxml')
title = soup.find('title')
print(title.getText()) 


2 个答案:

答案 0 :(得分:0)

嘿jacobara,我认为该网站出了点问题。您仍然可以阅读此内容

printf "uno\tdos\n" | read -a spanishWords
echo "${spanishWords[0]}" ## This is empty

答案 1 :(得分:0)

页面发出POST请求,您可以直接模仿

import requests
from bs4 import BeautifulSoup as bs

body = {"terminal": "1,16011301,," , "rows": 1,"visMode": 1}
r = requests.post('http://st.atb.no/New/minskjerm/DataHandler.ashx?type=departureTimes&lang=no', data = body)
soup = bs(r.content, 'lxml')