Question

我是Python的初学者。我想检索数据，但脚本只记录一页。

import json
import urllib2

import requests 
from bs4 import BeautifulSoup    

for i in range(12000,12200):   
    page="https://www.qualibat.com/resultat-de-la-recherche/mcp-peinture-{}".format(i)
    html = requests.get(page)
    soup = BeautifulSoup(html.text, 'html.parser')

data = json.loads(soup.find('script', type='application/ld+json').text)
print data

Answer 1

我假设您需要迭代范围中的URL并提取信息。所有你想念的是打印方法不在for循环中。

import json
import requests 
from bs4 import BeautifulSoup    

for i in range(12000,12002):   
    page="https://www.qualibat.com/resultat-de-la-recherche/mcp-peinture-{}".format(i)
    html = requests.get(page)
    soup = BeautifulSoup(html.text, 'html.parser')
    data = json.loads(soup.find('script', type='application/ld+json').text)
    print data

<强>更新使用异常处理程序，代码现在看起来像：

import json
import requests 
from bs4 import BeautifulSoup    

for i in range(12000,12002):
    page="https://www.qualibat.com/resultat-de-la-recherche/mcp-peinture-{}".format(i)
    try:
        html = requests.get(page)
        soup = BeautifulSoup(html.text, 'html.parser')
        data = json.loads(soup.find('script', type='application/ld+json').text)
        print(data)
    except:
        print "Error in scrapping: "+page

输出：

{'@context': 'http://schema.org', '@type': 'HomeAndConstructionBusiness', 'name': 'ZANELLO ETS', 'email': 'contact@zanello.fr', 'telephone': '02 33 77 11 22', 'numberOfEmployees': '165', 'foundingDate': '01/01/1925', 'address': {'@type': 'PostalAddress', 'streetAddress': "RUE DE L'ANCIENNE GARE BP 26", 'addressLocality': 'TESSY SUR VIRE', 'postalCode': '50420', 'addressCountry': 'FR'}}
{'@context': 'http://schema.org', '@type': 'HomeAndConstructionBusiness', 'name': 'ZENONE G. CONSTRUCTIONS', 'email': 'info@zenone.fr', 'telephone': '02 33 77 29 00', 'numberOfEmployees': '23', 'foundingDate': '14/01/1985', 'address': {'@type': 'PostalAddress', 'streetAddress': 'ZI DE LA CHEVALERIE BP 253', 'addressLocality': 'SAINT LO CEDEX', 'postalCode': '50003', 'addressCountry': 'FR'}}

N.B。：我已经将范围从12200减少到12002.根据要求进行调整。

BeautifulSoup多页错误

1 个答案: