在网址前添加其他字符串

时间:2018-05-31 13:33:00

标签: python-3.x beautifulsoup

我想在网址之前添加一个额外的字符串,因为报废的网址不正确,因为我需要从错误的网址中提取数据。

import urllib.request
from bs4 import BeautifulSoup
import re
import sqlite3
def make_soup(url):
    thepage = urllib.request.urlopen(url)
    soupdata = BeautifulSoup(thepage, "html.parser")
    return soupdata
soup = make_soup("https://saturn.etat.lu/tapes/tapes_fr_lst_pdt.jsp?sel=_")
allrecords = soup.findAll('tr')
recordsLength = len(allrecords)
for index in range(3, recordsLength):
    record = allrecords[index].find_all('a')
    agri= [record[1].get('href')]   
    for url in agri:
        agripage=urllib.request.urlopen(url)
        soup1=BeautifulSoup(agripage,"html.parser")

我收到以下错误:

unknown url type: 'tapes_fr_nfo_lap.jsp?pdt=1838&lmz=0'        

0 个答案:

没有答案