抓取纳斯达克网站

时间:2018-07-25 01:21:26

标签: python selenium operating-system screen-scraping

我正在运行以下脚本,以刮擦Nasdaq网站上特定时间范围内的公司列表。该脚本应该将文件下载到DownLoad文件夹中,使用公司名称重命名并将其传输到目标文件夹。最后,它应该删除最初下载的文件,然后继续循环。

一切似乎都正常-将第一个文件下载,重命名并移至目标文件,但是,在进行第二个下载时,它将返回此错误:

  

FileNotFoundError:文件b'C:\ Users \ Filippo   Sebastio \ Downloads \ HistoricalQuotes.csv'不存在

知道为什么吗?

from selenium import webdriver
import os
import pandas as pd
import time
import glob

def pull_nasdaq_data(tickers, save_path):


driver = webdriver.Chrome(executable_path=r'C:\Users\Filippo Sebastio\Desktop\chromedriver.exe')

for ticker in tickers:
    site = 'http://www.nasdaq.com/symbol/' + ticker + '/historical'
    driver.get(site)
    # Choose 10 year data from a drop down
    data_range = driver.find_element_by_name('ddlTimeFrame')
    for option in data_range.find_elements_by_tag_name('option'):
        if option.text == '18 months':
            option.click()
            break
    time.sleep(5)

    driver.find_element_by_id('lnkDownLoad').click()
    time.sleep(5)
    data = pd.read_csv(r'C:\Users\Filippo Sebastio\Downloads\HistoricalQuotes.csv')
    data['company'] = ticker

    file_loc = save_path + ticker + '.csv'
    data.to_csv(file_loc, index=False)

    os.chdir(r'C:\Users\Filippo Sebastio\Downloads')
    for f in glob.glob("Historical*.csv"):
        os.remove(f)

    print("Downloaded:  ", ticker)    
    time.sleep(5)  



save_path = r'C:\Users\Filippo Sebastio\Desktop\Stock'
tickers = ['mmm', 'tesla',  'pcb']

pull_nasdaq_data(tickers, save_path)

1 个答案:

答案 0 :(得分:0)

As mentioned above the tickers are a problem. When the tickers don't download you're left with the single HistoricalQuotes.csv in your download directory, when that gets deleted there is nothing to replace it and it throws the file not found error. I've added a directory for downloads which I think might help.

def pull_nasdaq_data(tickers, save_path, download_dir):


    driver = webdriver.Chrome()

    for ticker in tickers:
        site = 'http://www.nasdaq.com/symbol/' + ticker + '/historical'
        driver.get(site)
        # Choose 10 year data from a drop down
        data_range = driver.find_element_by_name('ddlTimeFrame')
        for option in data_range.find_elements_by_tag_name('option'):
            if option.text == '18 months':
                option.click()
                break
        time.sleep(5)

        driver.find_element_by_id('lnkDownLoad').click()
        time.sleep(1)
        data = pd.read_csv(download_dir + 'HistoricalQuotes.csv')
        data['company'] = ticker

        file_loc = save_path + ticker + '.csv'
        data.to_csv(file_loc, index=False)

        os.remove(download_dir + 'HistoricalQuotes.csv')

        print("Downloaded:  ", ticker)    
        time.sleep(5)  



save_path = '/Users/tetracycline/'
download_dir = '/Users/tetracycline/Downloads/'
tickers = ['mmm', 'tsla']

pull_nasdaq_data(tickers, save_path, download_dir)