如何使用python lxml解析和修改xml数据

时间:2019-06-15 10:43:21

标签: python xml python-3.6 lxml

我需要通过解析xml文件来使用lxml(作者和描述)来修改XML标记值。下面是我正在使用的输入文件和我需要的输出文件。下面是我正在使用的代码:

输入xml文件:

<Summary>  
<Author>ABC</Author>  
<Description>ABC DATA</Description>  
<Function>24</Function>  
</Summary>

必需的输出文件:

<Summary>  
<Author>DEF</Author>  
<Description>DEF DATA</Description>  
<Function>24</Function>  
</Summary> 

from lxml import etree  
root = etree.parse(r"C:\Users\input\input.xml")  
    for elem in root.xpath('.//Author'): 
    elem.text = "DEF"  
    root.write("output.xml", pretty_print=True,xml_declaration=True,encoding="UTF-8")

2 个答案:

答案 0 :(得分:0)

这应该有效

import xml.etree.ElementTree as ET

xml = '''<root>
    <Summary>  
        <Author>ABC</Author>  
        <Description>ABC DATA</Description>  
        <Function>24</Function>  
    </Summary>
    <Summary>  
        <Author>ABC</Author>  
        <Description>ABC DATA</Description>  
        <Function>24</Function>  
    </Summary>
</root>'''

tree = ET.fromstring(xml)
for author in tree.findall('.//Summary/Author'):
    author.text = 'new author value goes here'
for desc in tree.findall('.//Summary/Description'):
    desc.text = 'new desc value goes here'

ET.dump(tree)
# call the line below if you need to save to a file
# tree.write(open('new_file.xml', 'w'))

输出

<root>
    <Summary>  
        <Author>new author value goes here</Author>  
        <Description>new desc value goes here</Description>  
        <Function>24</Function>  
    </Summary>
    <Summary>  
        <Author>new author value goes here</Author>  
        <Description>new desc value goes here</Description>  
        <Function>24</Function>  
    </Summary>
</root>

答案 1 :(得分:0)

如果您只想将每次出现的“ ABC”替换为“ DEF”,否则将文本保持原样,则应这样做:

import requests
from bs4 import BeautifulSoup as bs

data = {"operationName":"questionData","variables":{"titleSlug":"two-sum"},"query":"query questionData($titleSlug: String!) {\n  question(titleSlug: $titleSlug) {\n    questionId\n    questionFrontendId\n    boundTopicId\n    title\n    titleSlug\n    content\n    translatedTitle\n    translatedContent\n    isPaidOnly\n    difficulty\n    likes\n    dislikes\n    isLiked\n    similarQuestions\n    contributors {\n      username\n      profileUrl\n      avatarUrl\n      __typename\n    }\n    langToValidPlayground\n    topicTags {\n      name\n      slug\n      translatedName\n      __typename\n    }\n    companyTagStats\n    codeSnippets {\n      lang\n      langSlug\n      code\n      __typename\n    }\n    stats\n    hints\n    solution {\n      id\n      canSeeDetail\n      __typename\n    }\n    status\n    sampleTestCase\n    metaData\n    judgerAvailable\n    judgeType\n    mysqlSchemas\n    enableRunCode\n    enableTestMode\n    envInfo\n    libraryUrl\n    __typename\n  }\n}\n"}

r = requests.post('https://leetcode.com/graphql', json = data).json()
soup = bs(r['data']['question']['content'], 'lxml')
title = r['data']['question']['title']
question =  soup.get_text().replace('\n',' ')
print(title, '\n', question)

输出是您想要的输出。