Python脚本只提取最后一个索引而不是全部索引 - Beautiful Soup

时间:2017-10-23 19:08:48

标签: python web-scraping beautifulsoup

下面代码的输出只给出了第二个网址的股票价格和名称,而不是两者。
我试着查看评论,看看是否有其他人遇到过这个问题,但没有人问过这个问题。代码在Python 2中

quote_page = ['http://www.bloomberg.com/quote/SPX:IND','http://www.bloomberg.com/quote/CCMP:IND']

data = []

for pg in quote_page:
    page = urllib2.urlopen(pg)

soup = BeautifulSoup(page, 'html.parser')

name_box = soup.find('h1', attrs = {'class':'name'})
name = name_box.text.strip() 

price_box = soup.find('div', attrs = {'class':'price'})
price = price_box.text

data.append((name, price))

with open('output/stock.csv','a') as csv_file:
    writer = csv.writer(csv_file)
    for name, price in data:
        writer.writerow([name, price, datetime.now()])

2 个答案:

答案 0 :(得分:0)

这是因为有缺陷的缩进。试一试。它将解决问题。通过摆脱冗余部分,我也缩短了一点。更改python 2中可用的部分。

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup

quote_page = ['http://www.bloomberg.com/quote/SPX:IND', 'http://www.bloomberg.com/quote/CCMP:IND']

for pg in quote_page:
    page = urlopen(pg).read()
    soup = BeautifulSoup(page,'lxml')
    name = soup.find(class_='name').text.strip()
    price = soup.find(class_='price').text
    print(name,price)
    with open('stock.csv','a') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow([name, price])

答案 1 :(得分:0)

Boolmberg

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
from datetime import datetime

quote_page = ['http://www.bloomberg.com/quote/SPX:IND', 'http://www.bloomberg.com/quote/CCMP:IND']

data = []

for pg in quote_page:
    page = urlopen(pg)

    soup = BeautifulSoup(page, 'html.parser')

    name_box = soup.find('h1', attrs={'class': 'name'})
    name = name_box.text.strip()

    price_box = soup.find('div', attrs={'class': 'price'})
    price = price_box.text

    data.append((name, price))

    with open('stock2.csv', 'a') as csv_file:
        writer = csv.writer(csv_file)
        for name, price in data:
            writer.writerow([name, price, datetime.now()])