从XML提取数据

时间:2018-07-25 18:30:16

标签: python python-3.x

我看了几个示例,但仍无法编辑以满足我的需要的示例。我试图从文件中提取maker和model标记,但是无论以前回答什么问题,我都可以不能为我工作。

编辑-可能没有什么不同。我对python的理解水平不同。尝试编辑Stack上已经提供的不同答案中提供的脚本,但我一直无法成功地使它正常工作。

<camera>
   <maker>Fujifilm</maker>
    <model>GFX 50S</model>
    <mount>Fujifilm G</mount>
    <cropfactor>0.79</cropfactor>
</camera>

2 个答案:

答案 0 :(得分:0)

看看python docs

import xml.etree.ElementTree as ET

root = ET.fromstring(xml_string)
maker = root.findtext('maker')
model = root.findtext('model')

答案 1 :(得分:0)

尝试bs4 ...?

from bs4 import BeautifulSoup

page = '''
        <camera>
            <maker>Fujifilm</maker>
            <model>GFX 50S</model>
            <mount>Fujifilm G</mount>
            <cropfactor>0.79</cropfactor>
        </camera>
        '''

soup = BeautifulSoup(page, 'lxml')
make = soup.find('maker')
model = soup.find('model')
print(f'Make: {make.text}\nModel: {model.text}')

对于多个条目,只需使用find_all()遍历它们

from bs4 import BeautifulSoup

page = '''
        <camera>
            <maker>Fujifilm</maker>
            <model>GFX 50S</model>
            <mount>Fujifilm G</mount>
            <cropfactor>0.79</cropfactor>
        </camera>
        <camera>
            <maker>thing1</maker>
            <model>thing2</model>
            <mount>Fujifilm G</mount>
            <cropfactor>0.79</cropfactor>
        </camera>
        <camera>
            <maker>thing3</maker>
            <model>thing4</model>
            <mount>Fujifilm G</mount>
            <cropfactor>0.79</cropfactor>
        </camera>
        <camera>
            <maker>thing5</maker>
            <model>thing6</model>
            <mount>Fujifilm G</mount>
            <cropfactor>0.79</cropfactor>
        </camera>
        '''

soup = BeautifulSoup(page, 'lxml')
make = soup.find_all('maker')
model = soup.find_all('model')
for x, y in zip(make, model):
    print(f'Make: {x.text}\nModel: {y.text}')

通过文件获取数据

from bs4 import BeautifulSoup

with open('path/to/your/file') as file:
    page = file.read()
    soup = BeautifulSoup(page, 'lxml')
    make = soup.find_all('maker')
    model = soup.find_all('model')
    for x, y in zip(make, model):
        print(f'Make: {x.text}\nModel: {y.text}')

不导入任何模块:

with open('/PATH/TO/YOUR/FILE') as file:

    for line in file:
        for each in line.split():
            if "maker" in each:
                each = each.replace("<maker>", "")
                print(each.replace("</maker>", ""))

这仅用于'maker'标记,将它们分成单独的定义并遍历它们可能是有益的