从日期创建数据框时出现问题

时间:2020-10-19 22:50:04

标签: pandas dataframe web-scraping

我想通过网络抓取创建一个数据框。代码如下:

import pandas as pd
import json
import re
from bs4 import BeautifulSoup

symbol = ['FMC','VMC','APD']
short_name = ['fmc','vulcan-materials','air-products-and-chemicals']

url_key_financial_ratio = 'https://www.macrotrends.net/stocks/charts/{}/{}/financial-ratios'


records = []

for i,j in zip(symbol, short_name):
    r = requests.get(url_key_financial_ratio.format(i, j))
    soup = BeautifulSoup(r.text, 'html.parser')
    pattern = re.compile(r' var originalData = (.*?);\r\n\r\n\r',re.DOTALL)
    data = json.loads(p.findall(r.text)[0])

        records.append({
          'symbol' : i,
          'date' : data[0]['field_name']

我希望输出看起来像这样:-

date         ticker    current_ratio    long_term_debt
2019-09-30   FMC       2.53630          0.22080
2018-09-30   FMC       2.17350          0.23070
2017-09-30   FMC       2.36110          0.25040
2016-09-30   FMC       1.31500          0.35150
2015-09-30   FMC       0.76650          0.34850
2014-09-30   FMC       1.11200          0.39080
2013-09-30   FMC       1.06550          0.41260
2012-09-30   FMC       1.26990          0.40900
2011-09-30   FMC       1.36200          0.39810
2010-09-30   FMC       1.35190          0.39110
2009-09-30   FMC       1.19740          0.42980
2008-09-30   FMC       1.28760          0.41130
2007-09-30   FMC       1.17980          0.35120
2006-09-30   FMC       1.12450          0.31650
2005-09-30   FMC       1.24260          0.31050

我是网络爬虫的新手,我无法使用append方法来建立该数据库,有人可以帮助我吗?

1 个答案:

答案 0 :(得分:1)

此脚本会将current_rationlong_term_debt列加载到数据框中:

import json
import pandas as pd
from bs4 import BeautifulSoup

symbol = ['FMC','VMC','APD']
short_name = ['fmc','vulcan-materials','air-products-and-chemicals']

url_key_financial_ratio = 'https://www.macrotrends.net/stocks/charts/{}/{}/financial-ratios'

records = []
for i,j in zip(symbol, short_name):
    r = requests.get(url_key_financial_ratio.format(i, j))
    soup = BeautifulSoup(r.text, 'html.parser')
    pattern = re.compile(r' var originalData = (.*?);\r\n\r\n\r',re.DOTALL)
    data = json.loads(pattern.findall(r.text)[0])

    current_ratio = next(d for d in data if 'Current Ratio' in d['field_name'])
    long_term_debt = next(d for d in data if 'Long-term Debt / Capital' in d['field_name'])

    for (k1, v1), (_, v2) in zip(current_ratio.items(), long_term_debt.items()):
        if k1 in ('field_name', 'popup_icon'):
            continue

        records.append({
          'date' : k1,
          'ticker': i,
          'current_ratio': v1,
          'long_term_debt': v2})

df = pd.DataFrame(records)
print(df)

打印:

          date ticker current_ratio long_term_debt
0   2019-12-31    FMC       1.49590        0.54200
1   2018-12-31    FMC       1.34640        0.40050
2   2017-12-31    FMC       1.65330        0.52510
3   2016-12-31    FMC       1.98110        0.47440
4   2015-12-31    FMC       2.04490        0.51620
5   2014-12-31    FMC       1.53600        0.42140
6   2013-12-31    FMC       1.48240        0.42330
7   2012-12-31    FMC       1.92150        0.36890
8   2011-12-31    FMC       2.03220        0.37400
9   2010-12-31    FMC       1.70870        0.29720
10  2009-12-31    FMC       2.09770        0.34160
11  2008-12-31    FMC       1.88750        0.38020
12  2007-12-31    FMC       1.58920        0.28280
13  2006-12-31    FMC       1.48820        0.34130
14  2005-12-31    FMC       1.61440        0.40010
15  2019-12-31    VMC       2.57550        0.33120
16  2018-12-31    VMC       1.79100        0.34820
17  2017-12-31    VMC       2.66470        0.36150
18  2016-12-31    VMC       3.05490        0.30250
19  2015-12-31    VMC       3.06830        0.30780
20  2014-12-31    VMC       2.03700        0.30520
21  2013-12-31    VMC       3.18080        0.39040
22  2012-12-31    VMC       2.25700        0.40180
23  2011-12-31    VMC       2.12450        0.41420
24  2010-12-31    VMC       1.32870        0.38030
25  2009-12-31    VMC       0.85550        0.34390
26  2008-12-31    VMC       0.53750        0.37730
27  2007-12-31    VMC       0.45770        0.28920
28  2006-12-31    VMC       1.49990        0.13800
29  2005-12-31    VMC       2.04040        0.13160
30  2019-09-30    APD       2.53630        0.22080
31  2018-09-30    APD       2.17350        0.23070
32  2017-09-30    APD       2.36110        0.25040
33  2016-09-30    APD       1.31500        0.35150
34  2015-09-30    APD       0.76650        0.34850
35  2014-09-30    APD       1.11200        0.39080
36  2013-09-30    APD       1.06550        0.41260
37  2012-09-30    APD       1.26990        0.40900
38  2011-09-30    APD       1.36200        0.39810
39  2010-09-30    APD       1.35190        0.39110
40  2009-09-30    APD       1.19740        0.42980
41  2008-09-30    APD       1.28760        0.41130
42  2007-09-30    APD       1.17980        0.35120
43  2006-09-30    APD       1.12450        0.31650
44  2005-09-30    APD       1.24260        0.31050