python scrape链接关键字

时间:2017-10-16 16:32:56

标签: python scrape

我是python的新手,我需要帮助来抓取某个关键字的所有链接。问题是我收到以下错误:

  

如果链接[“href”]中的“air-max”:            ^       IndentationError:预期有一个缩进块。

这是我的代码

import requests
import time
from bs4 import BeautifulSoup

headers = {"Content-Type": "application/x-www-form-urlencoded; 
charset=UTF-8","X-Requested-With": "XMLHttpRequest","User-Agent": 
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, 
like Gecko) Chrome/56.0.2924.87 Safari/537.36"}

for i in range(0,4):   
url = "https://www.aw-lab.com/shop/uomo/scarpe?p={}".format(i)
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

all_links = soup.find_all("a")
for link in all_links:
if link.has_key('href'):
if "air-max" in link["href"]:
    print(link["href"])

2 个答案:

答案 0 :(得分:0)

link.has_key('href'):后需要另一个缩进级别。另外,要保持一致;始终使用空格(首选)或始终使用制表符。这可能并非总是如此,但是,通常情况下,如果行的末尾有一个COLON :,则下一行应该进一步缩进一级。

for i in range(0,4):   
    url = "https://www.aw-lab.com/shop/uomo/scarpe?p={}".format(i)
    r = requests.get(url)
    soup = BeautifulSoup(r.content, "html.parser")

    all_links = soup.find_all("a")
    for link in all_links:
        if link.has_key('href'):
            if "air-max" in link["href"]:
                print(link["href"])

答案 1 :(得分:-1)

请使用像spyder IDE或jupyter笔记本这样的IDE进行开发。

import requests
import time
from bs4 import BeautifulSoup

headers = {"Content-Type": "application/x-www-form-urlencoded; 
charset=UTF-8","X-Requested-With": "XMLHttpRequest","User-Agent": 
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, 
like Gecko) Chrome/56.0.2924.87 Safari/537.36"}

for i in range(0,4):
    url = "https://www.aw-lab.com/shop/uomo/scarpe?p={}".format(i)
    r = requests.get(url)
    soup = BeautifulSoup(r.content, "html.parser")

all_links = soup.find_all("a")
for link in all_links:
    if link.has_key('href'):
        if "air-max" in link["href"]:
            print(link["href"])