我觉得这应该很简单,但是我对Python还是有点陌生,并且正在努力弄清楚应该怎么做。我正在抓取历史股票数据,并希望将它们放入一个Excel电子表格中。当前仅写出最后的库存数据。
我知道它基本上每次遍历循环都覆盖数据帧,但是我不确定如何修复它以追加数据帧,或者每次到达该点时都将其写入excel工作表的末尾。任何帮助将不胜感激。
这是我的代码:
import numpy as np
from bs4 import BeautifulSoup
import requests
import pandas as pd
import time
symbols = ['WYNN', 'FL', 'TTWO']
myColumnHeaders = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume']
for c in range(len(symbols)):
url = 'https://www.nasdaq.com/symbol/'+symbols[c]+'/historical'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
historicaldata = soup.find('div', {'id': 'quotes_content_left_pnlAJAX'})
data_rows = historicaldata.findAll('tr')[2:]
stock_data = [[td.getText().strip() for td in data_rows[a].findAll('td')]
for a in range(len(data_rows))]
df = pd.DataFrame(stock_data, columns=myColumnHeaders)
df.set_index('Date')
df['Volume'].str.replace(',','').astype(int)
for i in range(5):
if i == 0:
df[myColumnHeaders[i]] = pd.to_datetime(df[myColumnHeaders[i]], 'coerce')
else:
df[myColumnHeaders[i]] = pd.to_numeric(df[myColumnHeaders[i]], errors='coerce')
df.to_excel('stock data.xlsx',index=False)
答案 0 :(得分:1)
我已经更新了您的代码,以便在单个DataFrame中获取所有数据。
import numpy as np
from bs4 import BeautifulSoup
import requests
import pandas as pd
import time
symbols = ['WYNN', 'FL', 'TTWO']
myColumnHeaders = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume']
dfs = []
for c in range(len(symbols)):
url = 'https://www.nasdaq.com/symbol/'+symbols[c]+'/historical'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
historicaldata = soup.find('div', {'id': 'quotes_content_left_pnlAJAX'})
data_rows = historicaldata.findAll('tr')[2:]
stock_data = [[td.getText().strip() for td in data_rows[a].findAll('td')]
for a in range(len(data_rows))]
df = pd.DataFrame(stock_data, columns=myColumnHeaders)
df.set_index('Date')
df['Volume'].str.replace(',','').astype(int)
for i in range(5):
if i == 0:
df[myColumnHeaders[i]] = pd.to_datetime(df[myColumnHeaders[i]], 'coerce')
else:
df[myColumnHeaders[i]] = pd.to_numeric(df[myColumnHeaders[i]], errors='coerce')
df.index = [symbols[c]]*len(df)
dfs.append(df)
df = dfs[0].append(dfs[1]).append(dfs[2]).reset_index()
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='data', index=False)
writer.save()
答案 1 :(得分:1)
pd.DataFrame.append
这是低效率的,因为它涉及重复复制数据。更好的主意是创建一个数据帧列表,然后在循环外的最后一步将它们连接在一起。这是一些伪代码:
symbols = ['WYNN', 'FL', 'TTWO']
cols = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume']
dfs = [] # empty list which will hold your dataframes
for c in range(len(symbols)):
# some code
df = pd.DataFrame(stock_data, columns=cols)
df = df.set_index('Date')
df['Volume'] = df['Volume'].str.replace(',', '').astype(int)
df[cols[0]] = pd.to_datetime(df[cols[0]], errors='coerce')
df[cols[1:5]] = df[cols[1:5]].apply(pd.to_datetime, errors='coerce')
dfs.append(df) # append dataframe to list
res = pd.concat(dfs, ignore_index=True) # concatenate list of dataframes
res.to_excel('stock data.xlsx', index=False)
请注意,您正在执行许多操作,例如set_index
,就像默认情况下是 一样。事实并非如此。您应该将其分配回一个变量,例如df = df.set_index('Date')
。