Question

我正在尝试使用以下Python代码从EDGAR数据库中获取一些数据。

html1 = 'https://www.sec.gov/Archives/edgar/data/320193/000032019317000070/aapl-20170930.xml'
xbrl_resp = requests.get(html1)
xbrl_str = xbrl_resp.text
soup1 = BeautifulSoup(xbrl_str, 'lxml')
mytag = soup1.find('us-gaap:StockholdersEquity',{'contextRef':'FI2017Q4'})
print(mytag)

即使xml文件中存在标记，它也不会返回任何内容。任何建议将不胜感激

Answer 1

您遇到了一些问题。首先，传递请求的内容而不是文本。其次，使用xml解析器而不是lxml解析器。最后，您在'us-gaap：StockholdersEquity＆＃39;中错误地搜索了。标签

html1 = 'https://www.sec.gov/Archives/edgar/data/320193/000032019317000070/aapl-20170930.xml'
xbrl_resp = requests.get(html1)
xbrl_str = xbrl_resp.content
soup1 = BeautifulSoup(xbrl_str, 'xml')
mytag = soup1.find('us-gaap:StockholdersEquity',contextRef='FI2017Q4')
print(mytag)

Answer 2

XML Parser将xml标记转换为小写：请参阅此处：https://www.crummy.com/software/BeautifulSoup/bs4/doc/#parsing-xml。因此，您需要使用小写名称进行搜索，例如：

     mytag = soup1.find('us-gaap:stockholdersequity',contextref='FI2017Q4')

Answer 3

我遇到了同样的问题，soup.find('table')返回了None。此问题发生在lxml软件包版本为3.4.4的环境中。

在具有lxml 3.7.3版的另一个环境中，相同的代码可以正常工作。

因此，我回到了“糟糕”的环境并升级了lxml软件包版本。

pip install lxml --upgrade

soup.find('table')之后开始工作。

希望这会有所帮助！

Ram

BeautifulSoup返回noneType

3 个答案: