将数据框一起附加到for循环中

时间:2018-09-12 01:13:19

标签: python pandas loops dataframe append

我觉得这应该很简单,但是我对Python还是有点陌生​​,并且正在努力弄清楚应该怎么做。我正在抓取历史股票数据,并希望将它们放入一个Excel电子表格中。当前仅写出最后的库存数据。

我知道它基本上每次遍历循环都覆盖数据帧,但是我不确定如何修复它以追加数据帧,或者每次到达该点时都将其写入excel工作表的末尾。任何帮助将不胜感激。

这是我的代码:

import numpy as np
from bs4 import BeautifulSoup
import requests
import pandas as pd
import time

symbols = ['WYNN', 'FL', 'TTWO']
myColumnHeaders = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume']

for c in range(len(symbols)):
    url = 'https://www.nasdaq.com/symbol/'+symbols[c]+'/historical'
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    historicaldata = soup.find('div', {'id': 'quotes_content_left_pnlAJAX'})
    data_rows = historicaldata.findAll('tr')[2:]
    stock_data = [[td.getText().strip() for td in data_rows[a].findAll('td')]
                 for a in range(len(data_rows))]
    df = pd.DataFrame(stock_data, columns=myColumnHeaders)
    df.set_index('Date')

    df['Volume'].str.replace(',','').astype(int)
    for i in range(5):
        if i == 0:
            df[myColumnHeaders[i]] = pd.to_datetime(df[myColumnHeaders[i]], 'coerce')
        else:
            df[myColumnHeaders[i]] = pd.to_numeric(df[myColumnHeaders[i]], errors='coerce')

df.to_excel('stock data.xlsx',index=False) 

2 个答案:

答案 0 :(得分:1)

我已经更新了您的代码,以便在单个DataFrame中获取所有数据。

import numpy as np
from bs4 import BeautifulSoup
import requests
import pandas as pd
import time

symbols = ['WYNN', 'FL', 'TTWO']
myColumnHeaders = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume']

dfs = []

for c in range(len(symbols)):
    url = 'https://www.nasdaq.com/symbol/'+symbols[c]+'/historical'
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    historicaldata = soup.find('div', {'id': 'quotes_content_left_pnlAJAX'})
    data_rows = historicaldata.findAll('tr')[2:]
    stock_data = [[td.getText().strip() for td in data_rows[a].findAll('td')]
                 for a in range(len(data_rows))]
    df = pd.DataFrame(stock_data, columns=myColumnHeaders)
    df.set_index('Date')
    df['Volume'].str.replace(',','').astype(int)
    for i in range(5):
        if i == 0:
            df[myColumnHeaders[i]] = pd.to_datetime(df[myColumnHeaders[i]], 'coerce')
        else:
            df[myColumnHeaders[i]] = pd.to_numeric(df[myColumnHeaders[i]], errors='coerce')
    df.index = [symbols[c]]*len(df)
    dfs.append(df)

df = dfs[0].append(dfs[1]).append(dfs[2]).reset_index()
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='data', index=False)
writer.save()

答案 1 :(得分:1)

请勿循环使用pd.DataFrame.append

这是低效率的,因为它涉及重复复制数据。更好的主意是创建一个数据帧列表,然后在循环外的最后一步将它们连接在一起。这是一些伪代码:

symbols = ['WYNN', 'FL', 'TTWO']
cols = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume']

dfs = []  # empty list which will hold your dataframes

for c in range(len(symbols)):
    # some code

    df = pd.DataFrame(stock_data, columns=cols)
    df = df.set_index('Date')

    df['Volume'] = df['Volume'].str.replace(',', '').astype(int)

    df[cols[0]] = pd.to_datetime(df[cols[0]], errors='coerce')
    df[cols[1:5]] = df[cols[1:5]].apply(pd.to_datetime, errors='coerce')

    dfs.append(df)  # append dataframe to list

res = pd.concat(dfs, ignore_index=True)  # concatenate list of dataframes
res.to_excel('stock data.xlsx', index=False)

请注意,您正在执行许多操作,例如set_index,就像默认情况下是 一样。事实并非如此。您应该将其分配回一个变量,例如df = df.set_index('Date')