在Python中搜索,插入和替换文本的上一行

时间:2017-07-12 23:50:39

标签: python

我有一个包含以下信息的文本文件(示例)

a = text1 text2 text3 text4
b = text1 text8 text9 text5
b1 = text1 text10 text5
c = text6 text5 text1 text9
d = text5 text4 text2 text9

依旧......

我想要的是找到一个组合,例如 text8 text9 将其替换为text10并在其旁边创建一个新句子。最终结果将是:

import re

text = open('file.txt').read()
match_found=False
matches = re.finditer('text2', text)
m = None
for m in matches:
    match_found = True
    pass
if (match_found):
    m.start()
    m.end()
text[1:m.end()] + "text10" + text[(m.end()+1):]

到目前为止,我做过类似的事情(我是python的新手):

a = text1 text2 text3 text4
b = text1 text8 text9 text5
b1 = text1 text10 text5
c = text6 text5 text1 text9
d = text5 text4 text8 text9
d1 = text5 text4 text10

但没有任何反应,而且,该行可以出现在其他句子中:

{{1}}

3 个答案:

答案 0 :(得分:1)

我们可以将所有文件行存储在某个列表inlines中,然后遍历每一行,找到并用'text8 text9'替换'text10',然后将旧行和新行存储在要使用的新列表outlines,但稍后您想要使用。

由于问题含糊不清的假设:我们使用str.replace的第三个参数来替换第一次出现的字符串。

inlines = [line for line in open('in.txt', 'r')]
outlines = []
for line in inlines:
    label = line.split(' ')[0]
    newline = line.replace('text8 text9', 'text10', 1).replace(label, f'{label}1', 1)
    outlines.append(line)
    outlines.append(newline)

    # To print the lines as well we can add this
    print(line)
    print(newline)

答案 1 :(得分:0)

您需要以下内容:

import re

old = 'text8 text9'
new = 'text10'

text = open('file.txt').read()

new_lines = []
for line in text.split('\n'):
  # Replace all matches in one line in one go
  new_line = line.replace(old, new)
  new_lines.append(line)

  # There is a match; increment number
  if new_line != line:

    # Get number before equals sign
    parts = new_line.split(' =', 1)
    old_number = re.search(r'\d+', parts[0])
    new_number = 1

    # If there is a number, increment
    if old_number:
      old_number = int(old_number.group(0))
      new_number = old_number + 1
      parts[0] = parts[0].replace(str(old_number), str(new_number))

    # If there is no number, concatenate 1
    else:
      parts[0] += '1'

    new_lines.append(parts[0] + ' =' + parts[1])

print '\n'.join(new_lines)

但是,这不会为多个匹配打印多行。给出一个输入:

  

a = text1 text2 text3 text4

     

b = text1 text8 text9 text5

     

c = text6 text5 text1 text9

     

d20 = text5 text4 text2 text8 text9

     

e60 = text5 text4 text2 text8 text9 text8 text9

这将生成输出:

  

a = text1 text2 text3 text4

     

b = text1 text8 text9 text5

     

b1 = text1 text10 text5

     

c = text6 text5 text1 text9

     

d20 = text5 text4 text2 text8 text9

     

d21 = text5 text4 text2 text10

     

e60 = text5 text4 text2 text8 text9 text8 text9

     

e61 = text5 text4 text2 text10 text10

您可以在此处运行此示例:

<script src="//repl.it/embed/JZ84/0.js"></script>

或编辑here

答案 2 :(得分:0)

如果内存不是问题,那么搜索&amp;替换程序相对简单:

search = "text8 text9"
replace = "text10"

with open("file.txt", "r+") as f:
    lines = []  # storage list for (modified) lines
    for line in f:  # read the file line by line
        lines.append(line)  # add the current line to the lines list
        index = line.find(search)  # attempt to find the index of the search string
        if index != -1:  # search string found in the current line
            equals_index = line.find("=")  # find where the equals sign is
            name = line[:equals_index].strip() + "1"  # create a new 'sentence' name
            # replace the found string with the 'replace' string
            value = line[equals_index+1:index] + replace + line[index + len(search):]
            lines.append("{} ={}".format(name, value))  # add the new sentence
    # let's write down the updates, you can omit the following if you don't want to
    # update the file and use the `lines` list for whatever further manipulation
    f.seek(0)  # rewind back to the beginning of the file
    f.writelines(lines)  # write down the lines
    f.truncate()  # truncate the rest in case the new content is smaller than the old

如果内存是个问题,而不是存储行打开另一个文件流到临时文件并直接写入它而不是附加到lines列表,最后只是覆盖你的{{1}用那个临时文件。当然,您根本不需要更改文件,也可以将行存储到另一个文件中。

然而,这并没有正确处理多个'句子'的情况:

a = text1 text2 text3 text4
b = text1 text8 text9 text5
b = text1 text8 text9 text5
c = text6 text5 text1 text9

或者如果一行中有多个匹配,即file.txt。你必须澄清在这种情况下会发生什么(如果它们甚至可能发生)。这也假设你的'句子'之间没有空行 - 如果有,请确保在马赫上构建新句子时添加额外的行(即b = text1 text8 text9 text5 text8 text9)。此外,如果您的文件采用Windows格式,则可能必须将行结尾调整为lines.append("\n{} ={}\n".format(name, value)) ...

如果您希望我们解决这些问题,您将不得不描述这些边缘情况。