如何循环访问URL并从多个链接导入TD元素

时间:2018-02-23 04:17:04

标签: python-3.x

我正在尝试从以下网址导入数据,并将每个数据集写入CSV文件。

以下是一些我想要从中获取基本数据的示例URl:     https://finviz.com/quote.ashx?t=sbuc     https://finviz.com/quote.ashx?t=msft     https://finviz.com/quote.ashx?t=aapl

如何从'索引'中导入数据?改变'?

enter image description here

我认为脚本基本上应该是这样的。

import csv
import urllib.request
from bs4 import BeautifulSoup


soup = BeautifulSoup("html.parser")

url_base = "https://finviz.com/quote.ashx?t="
tckr = ['SBUX','MSFT','AAPL']
for stocks in tckr:
    url_list = [url_base + tckr]

with open('C:/Users/Excel/Desktop/today.csv', 'a', newline='') as file:
    writer = csv.writer(file)

    for url in url_list:
        try:
            fpage = urllib.request.urlopen(url)
            fsoup = BeautifulSoup(fpage, 'html.parser')

            # write header row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2-cp'})))

            # write body row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2'})))            
        except urllib.error.HTTPError:
            print("{} - not found".format(url))

除非我运行它时收到此错误消息:SyntaxError: EOL while scanning string literal

1 个答案:

答案 0 :(得分:2)

import csv
import requests
from bs4 import BeautifulSoup

url_base = "https://finviz.com/quote.ashx?t="
tckr = ['SBUX','MSFT','AAPL']
url_list = [url_base + s for s in tckr]

with open('../Python/SOtest.csv', 'a', newline='') as f:
    writer = csv.writer(f)

    for url in url_list:
        try:
            fpage = requests.get(url)
            fsoup = BeautifulSoup(fpage.content, 'html.parser')

            # write header row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2-cp'})))

            # write body row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2'})))            
        except HTTPError:
            print("{} - not found".format(url))

我使用请求,因此存在差异。但它确实有效,因此如果需要,您可以从那里提取代码。