将xml数据转换为数据框

时间:2019-01-09 10:41:09

标签: python xml

如何将XML数据转换为以下格式的数据框。

<start>
    <main index = '1', sub = 'english' >
        <name value = '1', text = 'hi this is xxx' />
        <name value = '2', text = 'isnt this funny' />
    </main>
    <main index = '2', sub = 'french'>
        <name value = '1', text = 'Comment vas-tu' />
        <name value = '2', text = 'sil vous plaît résoudre ce'>
    </main>
</start>

预期的DataFrame:

mainindex           namevalue           text
A                       1               hi this is xxx
A                       2               isnt this funny
B                       1               Comment vas-tu
B                       2               sil vous plaît résoudre ce

2 个答案:

答案 0 :(得分:1)

另一种方法:

saveFileName = 'yourOwnFileName.txt'

def main():
    mainindex = None

    with open('yourOwnXml.xml', 'r') as f_read:
        with open(saveFileName, 'w') as f_write:
            for line in f_read.readlines():
                if '<main index' in line.strip():
                    mainindex = line.split('\'')[1]
                if '<name value' in line.strip():
                    name_value = line.split('\'')[1]
                    text = line.split('\'')[3]
                    f_write.write('{mainindex} {namevalue} {text}\n'.format(mainindex=mainindex, namevalue=name_value, text=text))

if __name__ == '__main__':
    main()
yourOwnFileName.txt中的

输出应为:

1 1 hi this is xxx
1 2 isnt this funny
2 1 Comment vas-tu
2 2 sil vous plaît résoudre ce

答案 1 :(得分:0)

喜欢BeautifulSoup吗?

data = """<start>
    <main index = '1', sub = 'english' >
        <name value = '1', text = 'hi this is xxx' />
        <name value = '2', text = 'isnt this funny' />
    </main>
    <main index = '2', sub = 'french'>
        <name value = '1', text = 'Comment vas-tu' />
        <name value = '2', text = 'sil vous plaît résoudre ce'>
    </main>
</start>"""

data = BeautifulSoup(data)

headers = ['mainIndex','nameValue','text']

dataframe = pd.DataFrame(columns=headers)
pos = 0
i = 0
for m in data.find_all('main'):
    for name in m.find_all('name'):
        d = []
        d.append(chr(ord('A')+i))
        d.append(name.get('value'))
        d.append(name.get('text'))

        dataframe.loc[pos] = d
        pos+=1
    i+=1    

print(dataframe)

  mainIndex nameValue                        text
0         A         1              hi this is xxx
1         A         2             isnt this funny
2         B         1              Comment vas-tu
3         B         2  sil vous plaît résoudre ce