从Excel文档导入数据标头,使用pandas搜索Web,然后导出到同一个Excel文档

时间:2017-11-30 01:33:23

标签: python excel pandas dataframe

我正在试图弄清楚如何从特定的Excel工作表中导入数据,根据该数据搜索yahoo finance以获取信息,然后将通过yahoo(通过pandas web.datareader)接收的数据打印到特定的行/列中同样的excel文件。

这是我到目前为止所做的,但它并没有完成我要做的事情。这将根据代码中的输入搜索信息,而不是从Excel工作表中输入,并将连接的数据框导出到新创建的电子表格,而不是预先存在的特定行和列。

import datetime as dt
from datetime import datetime
import pandas as pd
import pandas_datareader.data as web

start = dt.datetime.strptime("8/11/2017", "%m/%d/%Y")
end = dt.datetime.today()
headerlist = ('stock1 Open', 'stock1 Close', 'stock2 Open', 'stock2 Close', 
'stock3 Open', 'stock3 Close', 'stock4 Open', 'stock4 Close')

df1 = web.DataReader('stock1', 'yahoo', start, end)[['Open','Close']]
df2 = web.DataReader('stock2', 'yahoo', start, end)[['Open','Close']]
df3 = web.DataReader('stock3', 'yahoo', start, end)[['Open','Close']]
df4 = web.DataReader('stock4', 'yahoo', start, end)[['Open','Close']]

resultingdf = pd.concat([df1, df2, df3, df4], axis=1)
resultingdf.to_csv('Portfolio.csv', header = headerlist)

非常感谢任何帮助或指示。

编辑:

上面提供的代码运行良好,它只是没有实现我正在考虑的目标,因为它至少不是自动化的。它需要代码本身的大量输入。以下是我试图完成的一般细分:

import datetime as dt
from datetime import datetime
import pandas as pd
import pandas_datareader.data as web
#import any other modules I may need

#establish timeframe
start = dt.datetime.strptime("8/11/2017", "%m/%d/%Y")
end = dt.datetime.today()

#search a pre-existing excel sheet's columns for stock tickers similar to: 
Name
AAA
BBB
CCC
DDD

#use panda's datareader to find the information for those tickers from yahoo, google, etc.

#concat. the dataframe

#export the dataframe to a specific row and column in the same excel sheet similar to (with identifying header):
Open   High   Low   Close,   Open   High   Low   Close

希望能更好地解释它。

1 个答案:

答案 0 :(得分:2)

这应该这样做:

import datetime as dt
from datetime import datetime
import pandas as pd
import pandas_datareader.data as web
from openpyxl import load_workbook

start = dt.datetime.strptime("8/11/2017", "%m/%d/%Y")
end = dt.datetime.today()

data_file = pd.ExcelFile('Stocks.xlsx',header=0).parse('Sheet1') #1st sheet
#print(data_file)

#datafile.columns is the excel's header row with stock 1, stock 2, etc. For example: FB, AAPL, etc.
stocks = web.DataReader(data_file.columns, 'yahoo', start, end)[['Open','Close']] #This creates a panel
#print(stocks)

#Writes to same workbookbook with data, but different tabs, where each column (Open, Close) is in a differen tab
#You can't "append" data to an existing sheet, only overwrite it in full including headers- so can't reuse next day
book = load_workbook('Stocks.xlsx')
writer= pd.ExcelWriter('Stocks.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
stocks.to_excel(writer)
writer.save()

或者,您可以使用结果创建新的Excel,其中每列(打开,关闭)位于不同的选项卡中:

writer= pd.ExcelWriter('Stock Data.xlsx', engine="xlsxwriter")
stocks.to_excel(writer) #,index=False)
writer.save()

您甚至可以通过在name +'xslx'中指定pd.ExcelWriter

来每天创建包含新数据的新文件
now = datetime.datetime.now()
name = 'Opportunistic_leads_' + str(now)[:10]

在这里查看:iteratively calling pandas datareader在某些方面可以转换stocks的输出。将面板导出到Excel时,这将适用。

请注意,我无法找到将数据附加到现有Excel工作表而不会覆盖它的方法,但这与我想要的一样接近。

仅供参考,当您将Panel(由DataReader生成)导出为Excel时,它会根据https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Panel.to_excel.html在新标签中创建每个“列”: