使用BeautifulSoup抓取多个网站

时间:2019-02-21 16:20:39

标签: python web-scraping beautifulsoup

我尝试使用BeautifulSoup获取多个网站的报价。我用循环尝试了以下代码,但是当我运行输出时,它只给我一个网站的报价:

url = ['https://finance.yahoo.com/quote/AAPL/key-statistics/', 'https://finance.yahoo.com/quote/BOX/key-statistics/']

for pg in url: 
    page = requests.get(pg)

soup = BeautifulSoup(page.content, "html.parser")

ticker = soup.find("h1", attrs={"data-reactid":"7"}).text

ticker

输出:

Out[147]: 'BOX - Box, Inc.'

然后我尝试使用附加函数:

data = [ ]
data.append(ticker)

但仍然只给我一个结果。这里有什么问题吗?

2 个答案:

答案 0 :(得分:1)

您的代码没有正确缩进,但是当我运行此代码时:

from bs4 import BeautifulSoup
import requests

url = ['https://finance.yahoo.com/quote/AAPL/key-statistics/', 'https://finance.yahoo.com/quote/BOX/key-statistics/']

data = []
for pg in url:
    page = requests.get(pg)
    soup = BeautifulSoup(page.content, "html.parser")
    ticker = soup.find("h1", attrs={"data-reactid":"7"}).text
    data.append(ticker)

print(data)

我明白了:

['AAPL - Apple Inc.', 'BOX - Box, Inc.']

答案 1 :(得分:-1)

您的代码是完美的。您所做的一切都将soup保留在for循环之外,因此仅占用last url而不是全部urls。现在尝试这个。

url = ['https://finance.yahoo.com/quote/AAPL/key-statistics/', 'https://finance.yahoo.com/quote/BOX/key-statistics/']

for pg in url:
    page = requests.get(pg)
    soup = BeautifulSoup(page.content, "html.parser")
    ticker = soup.find("h1", attrs={"data-reactid":"7"}).text
    print("Output :- " + ticker)

输出:-

Output :- AAPL - Apple Inc.
Output :- BOX - Box, Inc.