在python3中解析XML时遇到很多麻烦。
我只想获取作者姓名。即使经过数小时的搜索也无法弄清楚,您能帮我吗?
from urllib.request import urlopen
import xml.etree.ElementTree as ET
filing_url = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001326801&type=&dateb=&owner=include&start=0&count=40&output=atom"
tree = ET.parse('countries.xml')
root = tree.getroot()
for child in root.findall('author'):
print(child.tag, child.attrib)
xml内容
<?xml version="1.0" encoding="ISO-8859-1" ?>
<feed xmlns="http://www.w3.org/2005/Atom">
<author>
<email>webmaster@sec.gov</email>
<name>Webmaster</name>
</author>
<company-info><state-location>CA</state-location>
<state-location-href>http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&State=CA&owner=include&count=40</state-location-href>
<state-of-incorporation>DE</state-of-incorporation>
</company-info>
<entry>
<category label="form type" scheme="http://www.sec.gov/" term="4" />
<content type="text/xml">
<accession-nunber>0001127602-18-034767</accession-nunber>
<filing-date>2018-11-29</filing-date>
<filing-href>http://www.sec.gov/Archives/edgar/data/1326801/000112760218034767/0001127602-18-034767-index.htm</filing-href>
<filing-type>4</filing-type>
<form-name>Statement of changes in beneficial ownership of securities</form-name>
<size>4 KB</size>
</content>
<id>urn:tag:sec.gov,2008:accession-number=0001127602-18-034767</id>
<link href="http://www.sec.gov/Archives/edgar/data/1326801/000112760218034767/0001127602-18-034767-index.htm" rel="alternate" type="text/html" />
<summary type="html"> <b>Filed:</b> 2018-11-29 <b>AccNo:</b> 0001127602-18-034767 <b>Size:</b> 4 KB</summary>
<title>4 - Statement of changes in beneficial ownership of securities</title>
<updated>2018-11-29T18:46:54-05:00</updated>
</entry>
<entry>
<category label="form type" scheme="http://www.sec.gov/" term="4" />
<content type="text/xml">
<accession-nunber>0001127602-18-034766</accession-nunber>
<filing-date>2018-11-29</filing-date>
<filing-href>http://www.sec.gov/Archives/edgar/data/1326801/000112760218034766/0001127602-18-034766-index.htm</filing-href>
<filing-type>4</filing-type>
<form-name>Statement of changes in beneficial ownership of securities</form-name>
<size>19 KB</size>
</content>
<id>urn:tag:sec.gov,2008:accession-number=0001127602-18-034766</id>
<link href="http://www.sec.gov/Archives/edgar/data/1326801/000112760218034766/0001127602-18-034766-index.htm" rel="alternate" type="text/html" />
<summary type="html"> <b>Filed:</b> 2018-11-29 <b>AccNo:</b> 0001127602-18-034766 <b>Size:</b> 19 KB</summary>
<title>4 - Statement of changes in beneficial ownership of securities</title>
<updated>2018-11-29T18:44:39-05:00</updated>
</entry>
</feed>
答案 0 :(得分:0)
我不确定100%是您的问题。但是,如果您能够推荐使用 BeautifulSoup
例如:
from bs4 import BeautifulSoup
infile = open("myxml.xml","r")
contents = infile.read()
soup = BeautifulSoup(contents,'html.parser')
authors = soup.find_all('author')
for author in authors:
print (author)
#Output--
#<author>
#<email>webmaster@sec.gov</email>
#<name>Webmaster</name>
#</author>