我正在开发以下代码,以从特定网站来源抓取财务数据。
import requests
import pandas as pd
urls = ['https://www.marketwatch.com/investing/stock/aapl/financials/cash-flow',
'https://www.marketwatch.com/investing/stock/aapl/financials/cash-flow/quarter',
'https://www.marketwatch.com/investing/stock/MSFT/financials/cash-flow',
'https://www.marketwatch.com/investing/stock/MSFT/financials/cash-flow/quarter']
def main(urls):
with requests.Session() as req:
goal = []
for url in urls:
r = req.get(url)
df = pd.read_html(
r.content, match="Cash Dividends Paid - Total")[0].iloc[[0], 3:6]
goal.append(df)
new = pd.concat(goal)
print(new)
main(urls)
我正在获取所需的信息。
2017 2018 2019 30-Sep-2019 31-Dec-2019 31-Mar-2020
0 (12.77B) (13.71B) (14.12B) NaN NaN NaN
0 NaN NaN NaN (3.48B) (3.54B) (3.38B)
0 (11.85B) (12.7B) (13.81B) NaN NaN NaN
0 NaN NaN NaN (3.51B) (3.89B) (3.88B)
我需要至少刮掉20家公司(来自同一来源)。 该URL除了一个元素外基本相同(我将其称为 index )
https://www.marketwatch.com/investing/stock/' + index + '/financials/cash-flow'
是否可以添加名为 Index
的变量并使用变量索引
进行迭代类似的东西:
import requests
import pandas as pd
Index = 'MSFT, AAPL'
和
urls = ['https://www.marketwatch.com/investing/stock/' + Index + '/financials/cash-flow',
'https://www.marketwatch.com/investing/stock/' + Index + '/financials/cash-flow/quarter']
答案 0 :(得分:1)
非常简单的解决方案,您可以使用循环内循环和字符串格式来构造所需的URL。
例如:
import requests
import pandas as pd
indexes = 'aapl', 'MSFT', 'F'
def main(indexes):
urls = ['https://www.marketwatch.com/investing/stock/{index}/financials/cash-flow',
'https://www.marketwatch.com/investing/stock/{index}/financials/cash-flow/quarter']
goal = []
with requests.Session() as req:
for index in indexes:
for url in urls:
url = url.format(index=index)
print('Processing url', url)
r = req.get(url)
df = pd.read_html(
r.content, match="Cash Dividends Paid - Total")[0].iloc[[0], 3:6]
goal.append(df)
new = pd.concat(goal)
print(new)
main(indexes)
打印:
Processing url https://www.marketwatch.com/investing/stock/aapl/financials/cash-flow
Processing url https://www.marketwatch.com/investing/stock/aapl/financials/cash-flow/quarter
Processing url https://www.marketwatch.com/investing/stock/MSFT/financials/cash-flow
Processing url https://www.marketwatch.com/investing/stock/MSFT/financials/cash-flow/quarter
Processing url https://www.marketwatch.com/investing/stock/F/financials/cash-flow
Processing url https://www.marketwatch.com/investing/stock/F/financials/cash-flow/quarter
2017 2018 2019 30-Sep-2019 31-Dec-2019 31-Mar-2020
0 (12.77B) (13.71B) (14.12B) NaN NaN NaN
0 NaN NaN NaN (3.48B) (3.54B) (3.38B)
0 (11.85B) (12.7B) (13.81B) NaN NaN NaN
0 NaN NaN NaN (3.51B) (3.89B) (3.88B)
0 (2.58B) (2.91B) (2.39B) NaN NaN NaN
0 NaN NaN NaN (598M) (595M) (596M)