这个问题有点难以说明我的英语不足,但我会尽我所能。
我有一个xml文件目录,每个文件都包含xml,如:
<root>
<fields>
<field>
<description/>
<region id="Number.T2S366_R_487" page="1"/>
</field>
<field>
<description/>
<region id="Number.T2S366_R_488.`0" page="1"/>
<region id="String.T2S366_R_488.`1" page="1"/>
</field>
</fields>
</root>
我想在包含dot, tick, number
符号的行上进行字符串替换,例如.`0,其索引符号如[0],[1],[2],..等等。
因此,转换后的xml有效负载应如下所示:
<root>
<fields>
<field>
<description/>
<region id="Number.T2S366_R_487" page="1"/>
</field>
<field>
<description/>
<region id="Number.T2S366_R_488[0]" page="1"/>
<region id="String.T2S366_R_488[1]" page="1"/>
</field>
</fields>
</root>
如何使用python完成此操作?使用正则表达式看起来相当简单,但对于包含多个文件的文件目录来说,这很难做到。我希望看到使用python 3.x的实现,因为我正在学习它。
答案 0 :(得分:3)
在Python中,您可以使用os.listdir遍历目录中的所有文件,并使用fileinput进行替换:
import os
import fileinput
path = '/home/arabian_albert/'
for f in os.listdir(path):
with fileinput.FileInput(f, inplace=True, backup='.bak') as file:
for line in file:
print(re.sub(r'\.`(\d+)', r'\[\1\]', line), end='')
但是,您应该考虑使用sed:
从命令行执行此操作find . -type f -exec sed -i.bak -E "s/\.`([0-9]+)/[\1]/g" {} \;
以上内容将替换当前目录中的所有文件,并使用.bak
的旧文件进行备份。
答案 1 :(得分:2)
您可以使用简单的正则表达式来执行此操作:
import re
sample_str = """
<root>
<fields>
<field id="S366/487" type="xs:int" bind="T2S366/487">
<description/>
<region id="WholeNumberWithSeparator.T2S366_R_487" page="1"/>
</field>
<field id="S366/488" type="xs:int" bind="T2S366/488">
<description/>
<region id="Number.T2S366_R_488.`0" page="1"/>
<region id="String.T2S366_R_488.`1" page="1"/>
</field>
</fields>
</root>
"""
pattern = "\.`(\d+)"
result = re.sub(pattern, lambda x: "[{}]".format(x.groups()[0]), sample_str)
print result
产量
<root>
<fields>
<field id="S366/487" type="xs:int" bind="T2S366/487">
<description/>
<region id="WholeNumberWithSeparator.T2S366_R_487" page="1"/>
</field>
<field id="S366/488" type="xs:int" bind="T2S366/488">
<description/>
<region id="Number.T2S366_R_488[0]" page="1"/>
<region id="String.T2S366_R_488[1]" page="1"/>
</field>
</fields>
</root>
答案 2 :(得分:1)
这个怎么样:
wholefile = ''
with open(r'xml_input.xml', 'r+') as f:
lines = f.readlines()
for line in lines:
split_line = line.split('.') # split at periods
end_point = split_line.pop(-1) # get and remove existing endpoint
if end_point[0] == '`': # if it matches tick notation
idx_after_num = end_point.find('"') # get the first index that matches a double quote
the_int = end_point[1:idx_after_num] # slice from after the tick to the end of the int
end_point = list(end_point) # convert to list
del(end_point[:idx_after_num]) # delete up to the double quote
end_point = ''.join(end_point) # reconstruct string
new_endpoint = '[{}]'.format(the_int) + end_point # create new endpoint
split_line += [new_endpoint] # append new endpoint to end of list of split strs
new_line = '' # new empty string
for n, segment in enumerate(split_line):
if n >= len(split_line) - 2: # if we're at or beyond the endpoint
new_line += segment # concatenate the new endpoint
else:
new_line += segment + '.' # concatenate, replacing the needed '.'s
wholefile += new_line # replace, with changes
else:
wholefile += line # replace, with no changes
with open('xml_out.xml', 'w+') as f:
f.write(wholefile)
我的输出:
<root>
<fields>
<field id="S366/487" type="xs:int" bind="T2S366/487">
<description/>
<region id="WholeNumberWithSeparator.T2S366_R_487" page="1"/>
</field>
<field id="S366/488" type="xs:int" bind="T2S366/488">
<description/>
<region id="Number.T2S366_R_488[0]" page="1"/>
<region id="String.T2S366_R_488[1]" page="1"/>
</field>
</fields>
</root>