抱歉,我是一个初学者。我一直在尝试从SEC网站获取元数据。这是链接-https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001403161&type=10&dateb=&owner=exclude&count=40
让我们现在获取日期。我正在尝试xPath,但它抛出了IndexError。我检查了获取的html,它似乎确实有数据。
我的代码:
from lxml import html
import requests
page = requests.get('https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001403161&type=10&dateb=&owner=exclude&count=40')
tree = html.fromstring(page.content)
date = tree.xpath('//*[@id="seriesDiv"]/table/tbody/tr[2]/td[4]')[0].text
print(date)
如何使它工作?
任何帮助将不胜感激。
谢谢!
答案 0 :(得分:0)
不确定xpath,因为那是我写的方式。但是,如果您不必专门使用xpath,我将使用Pandas路由解析整个表,并且可以在需要时调用单个单元格:
pd.read_html()
将返回数据帧列表(即html中的所有<table>
标签)。您只需要调用所需的表,在这种情况下,该表就是索引位置2(或3个数据框的最后一个)
import pandas as pd
url = 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001403161&type=10&dateb=&owner=exclude&count=40'
dfs = pd.read_html(url)
df = dfs[-1]
输出: 打印(df.to_string())
print (df.to_string())
Filings Format Description Filing Date File/Film Number
0 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2019-07-26 001-3397719978181
1 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2019-04-26 001-3397719771802
2 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2019-01-31 001-3397719556097
3 10-K Documents Interactive Data Annual report [Section 13 and 15(d), not S-K I... 2018-11-16 001-33977181189947
4 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2018-07-27 001-3397718974910
5 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2018-04-27 001-3397718783872
6 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2018-02-01 001-3397718567042
7 10-K Documents Interactive Data Annual report [Section 13 and 15(d), not S-K I... 2017-11-17 001-33977171209440
8 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2017-07-20 001-3397717974492
9 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2017-04-21 001-3397717774258
10 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2017-02-02 001-3397717568413
11 10-K Documents Interactive Data Annual report [Section 13 and 15(d), not S-K I... 2016-11-15 001-33977162000223
12 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2016-07-25 001-33977161782265
13 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2016-04-25 001-33977161589237
14 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2016-01-28 001-33977161369122
15 10-K Documents Interactive Data Annual report [Section 13 and 15(d), not S-K I... 2015-11-20 001-33977151244628
16 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2015-07-23 001-33977151002526
17 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2015-04-30 001-3397715819049
18 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2015-01-29 001-3397715559143
19 10-K Documents Interactive Data Annual report [Section 13 and 15(d), not S-K I... 2014-11-21 001-33977141240400
20 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2014-07-24 001-3397714991576
21 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2014-04-24 001-3397714781985
22 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2014-01-30 001-3397714558846
23 10-K Documents Interactive Data Annual report [Section 13 and 15(d), not S-K I... 2013-11-22 001-33977131236561
24 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2013-07-24 001-3397713983884
25 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2013-05-01 001-3397713803519
26 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2013-02-06 001-3397713578037
27 10-K Documents Interactive Data Annual report [Section 13 and 15(d), not S-K I... 2012-11-16 001-33977121209935
28 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2012-07-27 001-3397712990778
29 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2012-05-02 001-3397712805918
30 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2012-02-08 001-3397712582250
31 10-K Documents Interactive Data Annual report [Section 13 and 15(d), not S-K I... 2011-11-18 001-33977111214519
32 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2011-07-29 001-3397711996223
33 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2011-05-05 001-3397711815087
34 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2011-02-02 001-3397711566916
35 10-K Documents Interactive Data Annual report [Section 13 and 15(d), not S-K I... 2010-11-19 001-33977101205707
36 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2010-08-02 001-3397710982428
37 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2010-05-03 001-3397710789509
38 10-Q Documents Interactive Data Quarterly report [Sections 13 or 15(d)]Acc-no:... 2010-02-03 001-3397710571090
39 10-K Documents Interactive Data Annual report [Section 13 and 15(d), not S-K I... 2009-11-20 001-33977091198831
要打印单个行和列:
print (df.loc[0,'Filing Date'])
2019-07-26
答案 1 :(得分:0)
此方法将返回整列-将数据归档为列表,
page = requests.get('https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001403161&type=10&dateb=&owner=exclude&count=40')
tree = html.fromstring(page.content)
Firstdate = tree.xpath('//table[@class="tableFile2"]//tr[2]/td[4]/text()')
print(Fristdate)
Alldates = tree.xpath('//table[@class="tableFile2"]//tr/td[4]/text()')
print(Alldates)
输出: ['2019-07-26','2019-04-26','2019-01-31','2018-11-16','2018-07-27','2018-04-27',' 2018-02-01','2017-11-17','2017-07-20','2017-04-21','2017-02-02','2016-11-15','2016- 07-25','2016-04-25','2016-01-28','2015-11-20','2015-07-23','2015-04-30','2015-01- 29','2014-11-21','2014-07-24','2014-04-24','2014-01-30','2013-11-22','2013-07-24' ,“ 2013-05-01”,“ 2013-02-06”,“ 2012-11-16”,“ 2012-07-27”,“ 2012-05-02”,“ 2012-02-08”,“ 2011-11-18','2011-07-29','2011-05-05','2011-02-02','2010-11-19','2010-08-02','2010- 05-03','2010-02-03','2009-11-20']