python-阅读文档并修改我要更改的内容

时间:2018-07-22 01:33:11

标签: python

我正在尝试使用python修改文档文本。

这是文档文本,如下所示:

abcdefghijklmn
<entry colname="1" rowname="1">a</entry>
<entry morecols="5" morecolname="2" namest="2" nameend="7" rowname="1">a</entry>
<entry colname="1" morerows="9" morerowname="2">b</entry>
<entry morecols="5" morecolname="2" namest="2" nameend="7" rowname="2">b</entry>
<entry colname="1" morerows="9" morerowname="2">b</entry>
<morecols="4" morecolname="3" namest="3" nameend="7" morerows="2" morerowname="3">c</entry>
<entry colname="2" rowname="3">c</entry>
<entry colname="2" rowname="4">d</entry>
<entry morecols="1" morecolname="2" namest="2" nameend="3" morerows="2" morerowname="5">e</entry>
<entry colname="2" rowname="5">e</entry>
abcdefghijklmn

我想在包含最后TEST(包括rowname="n")的句子的末尾添加morerowname="n"

所以这是我想要的结果

abcdefghijklmn
<entry colname="1" rowname="1">a</entry>
<entry morecols="5" morecolname="2" namest="2" nameend="7" rowname="1">a</entry>TEST
<entry colname="1" morerows="9" morerowname="2">b</entry>
<entry morecols="5" morecolname="2" namest="2" nameend="7" rowname="2">b</entry>
<entry colname="1" morerows="9" morerowname="2">b</entry>TEST
<morecols="4" morecolname="3" namest="3" nameend="7" morerows="2" morerowname="3">c</entry>
<entry colname="2" rowname="3">c</entry>TEST
<entry colname="2" rowname="4">d</entry>TEST
<entry morecols="1" morecolname="2" namest="2" nameend="3" morerows="2" morerowname="5">e</entry>
<entry colname="2" rowname="5">e</entry>TEST
abcdefghijklmn

这是我到目前为止正在尝试的代码,但是我不知道如何编写if选项

with open("C:\\TEST\\test_addrow.xml","r",encoding="utf-8") as f:
    data = f.read()

result = list()
All_text = data.split("\n")

a = 1
find_text = 'rowname="{}".*'.format(a)

for t in All_text:
    if re.search(find_text, data) :
       re.findall(find_text, data)[-1]
       result.append(t+"TEST")
    a = a + 1
    else:
        result.append(t)

with open("C:\\TEST\\test_addrow.xml","w",encoding="utf-8") as f:
    f.write("\n".join(result))

您能给我什么建议吗? 谢谢

2 个答案:

答案 0 :(得分:1)

您可以尝试以下代码。

  

在这种情况下,使用在字符串上定义的split()方法也是一个不错的选择。正则表达式也很不错,就像您在代码中使用的一样。

     

通过http://rextester.com/HOLRV63641

在线试用
import re

# Reading XML file 
with open("C:\\TEST\\test_addrow.xml", "r", encoding='utf-8') as f:
    lines = f.readlines()

last_num = ""  # It is to store the value of rowname & morerowname attributes
last_index = 0 # It is to store the last index matched for line which has rowname and morerowname attibutes
opened = False # It is to track he first and last match found for sequence of same numbers

for i, line in enumerate(lines):
    arr = re.findall(r"rowname=\"\d+", line)
    arr2 = []
    if arr:
        arr2 = arr[0].split('"')

    if arr2:
        if last_num and last_num != arr2[1]:
            lines[last_index] = lines[last_index].strip() + 'TEST' + '\n'
            opened = False # Added TEST so close
        else:
            opened = True  # Continue as the number is matched

        last_index = i
        last_num = arr2[1]
    else:
        if last_index:
            lines[last_index] = lines[last_index].strip() + 'TEST' + '\n'
            opened = False # Added TEST so close

# In cases like if the XML file only has 1 line
if opened:
    lines[last_index] = lines[last_index].strip() + 'TEST' + '\n'

lines = "".join(lines)

# Writing modified lines to file
with open("C:\\TEST\\test_addrow.xml", "w", encoding='utf-8') as f:
    f.write(lines)

答案 1 :(得分:1)

在这种情况下,您可以对要分割的内容进行一些操作...逻辑仍然相同:

输入:

$cat test_addrow.xml
<entry colname="1" rowname="1">a</entry>
<entry morecols="5" morecolname="2" namest="2" nameend="7" rowname="1">a</entry>
<entry colname="1" morerows="9" morerowname="2">b</entry>
<entry morecols="5" morecolname="2" namest="2" nameend="7" rowname="2">b</entry>
<entry colname="1" morerows="9" morerowname="2">b</entry>
<morecols="4" morecolname="3" namest="3" nameend="7" morerows="2" morerowname="3">c</entry>
<entry colname="2" rowname="3">c</entry>
<entry colname="2" rowname="4">d</entry>
<entry morecols="1" morecolname="2" namest="2" nameend="3" morerows="2" morerowname="5">e</entry>
<entry colname="2" rowname="5">e</entry>

代码:

with open('test_addrow.xml') as file:
    lines = file.readlines()
    with open('test_addrow.xml', 'w') as file1:
        for i, line in enumerate(lines[:-1]):
            current_n = int(line.split('rowname="')[-1].split('"')[0])
            next_n = int(lines[i+1].split('rowname="')[-1].split('"')[0])
            if next_n != current_n:
                file1.write(line.strip() + "TEST\n")
            else:
                file1.write(line)
        # Write the last line which always has TEST appended
        file1.write(lines[-1].strip() + "TEST\n")

输出:

$cat test_addrow.xml
<entry colname="1" rowname="1">a</entry>
<entry morecols="5" morecolname="2" namest="2" nameend="7" rowname="1">a</entry>TEST
<entry colname="1" morerows="9" morerowname="2">b</entry>
<entry morecols="5" morecolname="2" namest="2" nameend="7" rowname="2">b</entry>
<entry colname="1" morerows="9" morerowname="2">b</entry>TEST
<morecols="4" morecolname="3" namest="3" nameend="7" morerows="2" morerowname="3">c</entry>
<entry colname="2" rowname="3">c</entry>TEST
<entry colname="2" rowname="4">d</entry>TEST
<entry morecols="1" morecolname="2" namest="2" nameend="3" morerows="2" morerowname="5">e</entry>
<entry colname="2" rowname="5">e</entry>TEST