我想通过网络抓取创建一个数据框。代码如下:
import pandas as pd
import json
import re
from bs4 import BeautifulSoup
symbol = ['FMC','VMC','APD']
short_name = ['fmc','vulcan-materials','air-products-and-chemicals']
url_key_financial_ratio = 'https://www.macrotrends.net/stocks/charts/{}/{}/financial-ratios'
records = []
for i,j in zip(symbol, short_name):
r = requests.get(url_key_financial_ratio.format(i, j))
soup = BeautifulSoup(r.text, 'html.parser')
pattern = re.compile(r' var originalData = (.*?);\r\n\r\n\r',re.DOTALL)
data = json.loads(p.findall(r.text)[0])
records.append({
'symbol' : i,
'date' : data[0]['field_name']
我希望输出看起来像这样:-
date ticker current_ratio long_term_debt
2019-09-30 FMC 2.53630 0.22080
2018-09-30 FMC 2.17350 0.23070
2017-09-30 FMC 2.36110 0.25040
2016-09-30 FMC 1.31500 0.35150
2015-09-30 FMC 0.76650 0.34850
2014-09-30 FMC 1.11200 0.39080
2013-09-30 FMC 1.06550 0.41260
2012-09-30 FMC 1.26990 0.40900
2011-09-30 FMC 1.36200 0.39810
2010-09-30 FMC 1.35190 0.39110
2009-09-30 FMC 1.19740 0.42980
2008-09-30 FMC 1.28760 0.41130
2007-09-30 FMC 1.17980 0.35120
2006-09-30 FMC 1.12450 0.31650
2005-09-30 FMC 1.24260 0.31050
我是网络爬虫的新手,我无法使用append方法来建立该数据库,有人可以帮助我吗?
答案 0 :(得分:1)
此脚本会将current_ration
和long_term_debt
列加载到数据框中:
import json
import pandas as pd
from bs4 import BeautifulSoup
symbol = ['FMC','VMC','APD']
short_name = ['fmc','vulcan-materials','air-products-and-chemicals']
url_key_financial_ratio = 'https://www.macrotrends.net/stocks/charts/{}/{}/financial-ratios'
records = []
for i,j in zip(symbol, short_name):
r = requests.get(url_key_financial_ratio.format(i, j))
soup = BeautifulSoup(r.text, 'html.parser')
pattern = re.compile(r' var originalData = (.*?);\r\n\r\n\r',re.DOTALL)
data = json.loads(pattern.findall(r.text)[0])
current_ratio = next(d for d in data if 'Current Ratio' in d['field_name'])
long_term_debt = next(d for d in data if 'Long-term Debt / Capital' in d['field_name'])
for (k1, v1), (_, v2) in zip(current_ratio.items(), long_term_debt.items()):
if k1 in ('field_name', 'popup_icon'):
continue
records.append({
'date' : k1,
'ticker': i,
'current_ratio': v1,
'long_term_debt': v2})
df = pd.DataFrame(records)
print(df)
打印:
date ticker current_ratio long_term_debt
0 2019-12-31 FMC 1.49590 0.54200
1 2018-12-31 FMC 1.34640 0.40050
2 2017-12-31 FMC 1.65330 0.52510
3 2016-12-31 FMC 1.98110 0.47440
4 2015-12-31 FMC 2.04490 0.51620
5 2014-12-31 FMC 1.53600 0.42140
6 2013-12-31 FMC 1.48240 0.42330
7 2012-12-31 FMC 1.92150 0.36890
8 2011-12-31 FMC 2.03220 0.37400
9 2010-12-31 FMC 1.70870 0.29720
10 2009-12-31 FMC 2.09770 0.34160
11 2008-12-31 FMC 1.88750 0.38020
12 2007-12-31 FMC 1.58920 0.28280
13 2006-12-31 FMC 1.48820 0.34130
14 2005-12-31 FMC 1.61440 0.40010
15 2019-12-31 VMC 2.57550 0.33120
16 2018-12-31 VMC 1.79100 0.34820
17 2017-12-31 VMC 2.66470 0.36150
18 2016-12-31 VMC 3.05490 0.30250
19 2015-12-31 VMC 3.06830 0.30780
20 2014-12-31 VMC 2.03700 0.30520
21 2013-12-31 VMC 3.18080 0.39040
22 2012-12-31 VMC 2.25700 0.40180
23 2011-12-31 VMC 2.12450 0.41420
24 2010-12-31 VMC 1.32870 0.38030
25 2009-12-31 VMC 0.85550 0.34390
26 2008-12-31 VMC 0.53750 0.37730
27 2007-12-31 VMC 0.45770 0.28920
28 2006-12-31 VMC 1.49990 0.13800
29 2005-12-31 VMC 2.04040 0.13160
30 2019-09-30 APD 2.53630 0.22080
31 2018-09-30 APD 2.17350 0.23070
32 2017-09-30 APD 2.36110 0.25040
33 2016-09-30 APD 1.31500 0.35150
34 2015-09-30 APD 0.76650 0.34850
35 2014-09-30 APD 1.11200 0.39080
36 2013-09-30 APD 1.06550 0.41260
37 2012-09-30 APD 1.26990 0.40900
38 2011-09-30 APD 1.36200 0.39810
39 2010-09-30 APD 1.35190 0.39110
40 2009-09-30 APD 1.19740 0.42980
41 2008-09-30 APD 1.28760 0.41130
42 2007-09-30 APD 1.17980 0.35120
43 2006-09-30 APD 1.12450 0.31650
44 2005-09-30 APD 1.24260 0.31050