嵌入id =" x"的beautifulsoup标签;蟒

时间:2018-04-14 21:09:41

标签: python beautifulsoup

我正在尝试从中提取xml,如下所示:

 <transactionCoding>
            <transactionFormType>4</transactionFormType>
            <transactionCode>S</transactionCode>
            <equitySwapInvolved>0</equitySwapInvolved>
            <footnoteId id="F1"/>

除了footnoteid之外,我可以得到每个标签的所有值,而没有返回。我已经尝试了footnoteid.string,gettext getattr等的所有功能,但没有任何作用。我需要从标签中获取值F1,但我无法弄清楚如何

这里是代码:

import os
from bs4 import BeautifulSoup


def parse_xml_string(fls):
        temp = fls.find_next("value")
        t = temp.text
        print (t)
        return (t)

if __name__ == '__main__':

    nonderivtxn = soup.find_all("nonderivativetransaction")

    nd =  [[] for _ in range(len(nonderivtxn))]


    for index in range(len(nonderivtxn)):
            coding = nonderivtxn[index]. find("transactioncoding")
            tformtype = coding.transactionformtype.text
            tcode = coding.transactioncode.text
            swapinvolved = coding.equityswapinvolved.text
            footnote= coding.footnoteid.gettext()
            print (tcode,swapinvolved,footnote.content,tformtype)

1 个答案:

答案 0 :(得分:0)

如果您的xml如下所示:

<footnotes><footnote id="F1">See Exhibit 99.1 for text of footnote (1).</footnote><footnote id="F2">See Exhibit 99.1 for text of footnote (2).</footnote><footnote id="F3">See Exhibit 99.1 for text of footnote (3).</footnote><footnote id="F4">See Exhibit 99.1 for text of footnote (4).</footnote><footnote id="F5">See Exhibit 99.1 for text of footnote (5).</footnote></footnotes>

使用find_all函数将脚注加载到数组中。 您可以使用以下代码访问值:

def get_footnote_list(id,footnote_array):
fl = []

for i in range(len(id)):
    for j in range(len(footnote_array)):
        if id[i] == footnote_array[j].get("id"):
            fl.append(footnote_array[j].decode_contents())

return (fl)