Question

我是Python的新手，我正在尝试将一些XML处理成CSV文件，以便稍后对数据库的输出进行差异验证。我在下面的代码很好地采用了＆＃39; tct-id＆＃39;来自XML的属性，并将它们输出到标题＆＃39; DocumentID＆＃39;下的一个很好的列中，因为我需要进行验证。

但是，数据库的输出只是数字，而此代码的输出包括XML ID的版本号;例如

tct-id="D-TW-0010054;3;"

我需要删除; 3; ，以便我可以正确验证。

这是我的代码;我有什么方法可以重写这个，所以它会预处理XML片段以删除它 - 比如只从每个属性中取出前12个字符并将其写入CSV，例如？

from lxml import etree
import csv

xml_fname = 'example.xml'
csv_fname = 'output.csv'

fields = ['tct-id']

xml = etree.parse(xml_fname)

with open(xml_fname) as infile, open(csv_fname, 'w', newline='') as outfile:
    r = csv.DictReader(infile)
    w = csv.DictWriter(outfile, fields, delimiter=';', extrasaction="ignore")

    wtr = csv.writer(outfile)    
    wtr.writerow(["DocumentID"])

    for node in xml.xpath("//*[self::StringVariables or self::ElementVariables or self::PubInfo or self::Preface or self::DocHistory or self::Glossary or self::StatusInfo or self::Chapter]"):
        atts = node.attrib
        atts["elm_name"] = node.tag
        w.writerow(node.attrib)

非常感谢所有帮助。

Answer 1

假设您只能从JpaRepository.save()中删除一个;3;类型字符串，则可以使用正则表达式

tct-id

注意我使用的是import re tct_id="D-TW-0010054;3;" to_rem=re.findall(r'(;.*;)',tct_id)[0] tct_id=tct_id.replace(to_rem,'')而不是tct_id，因为python通常不允许像那样设置变量

将XML转换为CSV时，仅使用属性的一部分

1 个答案: