如何用另一个索引字符串Python替换String的所有实例

时间:2018-01-26 00:13:16

标签: python regex xml lxml

这个问题有点难以说明我的英语不足,但我会尽我所能。

我有一个xml文件目录,每个文件都包含xml,如:

<root>
    <fields>
        <field>
            <description/>
            <region id="Number.T2S366_R_487" page="1"/>
        </field>
        <field>
            <description/>
            <region id="Number.T2S366_R_488.`0" page="1"/>
            <region id="String.T2S366_R_488.`1" page="1"/>
        </field>
    </fields>
</root>

我想在包含dot, tick, number符号的行上进行字符串替换,例如.`0,其索引符号如[0],[1],[2],..等等。

因此,转换后的xml有效负载应如下所示:

<root>
    <fields>
        <field>
            <description/>
            <region id="Number.T2S366_R_487" page="1"/>
        </field>
        <field>
            <description/>
            <region id="Number.T2S366_R_488[0]" page="1"/>
            <region id="String.T2S366_R_488[1]" page="1"/>
        </field>
    </fields>
</root>

如何使用python完成此操作?使用正则表达式看起来相当简单,但对于包含多个文件的文件目录来说,这很难做到。我希望看到使用python 3.x的实现,因为我正在学习它。

3 个答案:

答案 0 :(得分:3)

在Python中,您可以使用os.listdir遍历目录中的所有文件,并使用fileinput进行替换:

import os
import fileinput

path = '/home/arabian_albert/'
for f in os.listdir(path):
    with fileinput.FileInput(f, inplace=True, backup='.bak') as file:
        for line in file:
            print(re.sub(r'\.`(\d+)', r'\[\1\]', line), end='')

但是,您应该考虑使用sed:

从命令行执行此操作
find . -type f -exec sed -i.bak -E "s/\.`([0-9]+)/[\1]/g" {} \;

以上内容将替换当前目录中的所有文件,并使用.bak的旧文件进行备份。

答案 1 :(得分:2)

您可以使用简单的正则表达式来执行此操作:

import re
sample_str = """
<root>
    <fields>
        <field id="S366/487" type="xs:int" bind="T2S366/487">
            <description/>
            <region id="WholeNumberWithSeparator.T2S366_R_487" page="1"/>
        </field>
        <field id="S366/488" type="xs:int" bind="T2S366/488">
            <description/>
            <region id="Number.T2S366_R_488.`0" page="1"/>
            <region id="String.T2S366_R_488.`1" page="1"/>
        </field>
    </fields>
</root>
"""
pattern = "\.`(\d+)"
result = re.sub(pattern, lambda x: "[{}]".format(x.groups()[0]), sample_str)
print result

产量

<root>
    <fields>
        <field id="S366/487" type="xs:int" bind="T2S366/487">
            <description/>
            <region id="WholeNumberWithSeparator.T2S366_R_487" page="1"/>
        </field>
        <field id="S366/488" type="xs:int" bind="T2S366/488">
            <description/>
            <region id="Number.T2S366_R_488[0]" page="1"/>
            <region id="String.T2S366_R_488[1]" page="1"/>
        </field>
    </fields>
</root>

答案 2 :(得分:1)

这个怎么样:

wholefile = ''

with open(r'xml_input.xml', 'r+') as f:
    lines = f.readlines()
    for line in lines:
        split_line = line.split('.')  # split at periods
        end_point = split_line.pop(-1)  # get and remove existing endpoint
        if end_point[0] == '`':  # if it matches tick notation
            idx_after_num = end_point.find('"')  # get the first index that matches a double quote
            the_int = end_point[1:idx_after_num]  # slice from after the tick to the end of the int
            end_point = list(end_point)  # convert to list
            del(end_point[:idx_after_num])  # delete up to the double quote
            end_point = ''.join(end_point)  # reconstruct string
            new_endpoint = '[{}]'.format(the_int) + end_point  # create new endpoint
            split_line += [new_endpoint]  # append new endpoint to end of list of split strs
            new_line = ''  # new empty string
            for n, segment in enumerate(split_line):
                if n >= len(split_line) - 2:  # if we're at or beyond the endpoint
                    new_line += segment  # concatenate the new endpoint
                else:
                    new_line += segment + '.'  # concatenate, replacing the needed '.'s
            wholefile += new_line  # replace, with changes
        else:
            wholefile += line  # replace, with no changes

with open('xml_out.xml', 'w+') as f:
    f.write(wholefile)

我的输出:

<root>
    <fields>
        <field id="S366/487" type="xs:int" bind="T2S366/487">
            <description/>
            <region id="WholeNumberWithSeparator.T2S366_R_487" page="1"/>
        </field>
        <field id="S366/488" type="xs:int" bind="T2S366/488">
            <description/>
            <region id="Number.T2S366_R_488[0]" page="1"/>
            <region id="String.T2S366_R_488[1]" page="1"/>
        </field>
    </fields>
</root>