Question

下面的函数从该URL-https://www.sec.gov/Archives/edgar/monthly/xbrlrss-2018-12.xml中提取xml。

请注意，XML包含很多'edgar：'。

在整个XML文件中查找“ edgar：”并替换为“ edgar_”的最简单方法是什么？

谢谢

import requests
import urllib.request  as urllib2
import xml.etree.ElementTree as ET
from lxml import etree

def quarter_filing_urls(year, month):

    url = "https://www.sec.gov/Archives/edgar/monthly/xbrlrss-" + str(year) + "-" + str(month) + ".xml"
    tree = ET.parse(urllib2.urlopen(url))
    root = tree.getroot()
    return root

更新

一种选择是使用命名空间，如下所示。但是我尝试一下，我得到：'AttributeError：'set'对象没有属性'items'

def quarter_filing_urls(year, month):

    url = "https://www.sec.gov/Archives/edgar/monthly/xbrlrss-" + str(year) + "-" + str(month) + ".xml"
    tree = ET.parse(urllib2.urlopen(url))
    root = tree.getroot()

    filings = []
    namespaces = {"edgar:xbrlFiling", 'rss'}
    for item in root.findall("./channel/item/edgar:xbrlFiling/", namespaces):
        filing = dict(item.attrib)
        filings.append(filing)

    return filings

在python XML中查找并替换

0 个答案: