我想在网址之前添加一个额外的字符串,因为报废的网址不正确,因为我需要从错误的网址中提取数据。
import urllib.request
from bs4 import BeautifulSoup
import re
import sqlite3
def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata
soup = make_soup("https://saturn.etat.lu/tapes/tapes_fr_lst_pdt.jsp?sel=_")
allrecords = soup.findAll('tr')
recordsLength = len(allrecords)
for index in range(3, recordsLength):
record = allrecords[index].find_all('a')
agri= [record[1].get('href')]
for url in agri:
agripage=urllib.request.urlopen(url)
soup1=BeautifulSoup(agripage,"html.parser")
我收到以下错误:
unknown url type: 'tapes_fr_nfo_lap.jsp?pdt=1838&lmz=0'