Question

在文件中，我想基于字符串模式找到特定的内容（行或行），对其进行更改，然后替换它。该图案可以出现多次。该文件可以包含代码（python或c）。

字符串模式可以有多种形式，所以我想使用正则表达式，例如：

custom_log("lorem ipsum can be anything ....")

或

custom_log("lorem ipsum"
           "can be anything")

或

custom_log("""lorem ipsum
           can be anything""")

引号可以是简单的也可以是双引号。

我从逐行开始，然后搜索图案

with open(filepath, mode="r") as f:
  for line in f.readlines():
       if "pattern" in line

但是问题是，我搜索的内容可以是一行或多行，而引号之间可以是字符串形式的任何内容。

我不能使用简单的替换，因为我需要获取内容，传递给函数，更改/调整内容并将其传递回回写入文件。

我想保留文件的初始格式。

Answer 1

这个问题需要更具体的示例，因此我将给出一个示例，其中匹配和替换可能跨越多行。

给出以下示例输入sample.txt：

the quick brown fox jumped over the lazy dog
the quick
brown fox
jumped over
the lazy dog

以下代码将替换某些单词对，即使使用re.sub和lambda函数来处理替换单词时也会使它们成行：

import re

with open('sample.txt') as f:
    data = f.read()

def replace(m):
    return ''.join([c if c.isspace() else '*'
                    for c in m.group(0)])

data = re.sub(r'quick\s+brown|over\s+the',replace,data)
print(data)

输出：

the ***** ***** fox jumped **** *** lazy dog
the *****
***** fox
jumped ****
*** lazy dog

Answer 2

我们可以构建一个正则表达式，该正则表达式将查找要查找的文本单词，并用任意数量的\n，空格或引号分隔。我们还将对圆括号之间的整个部分进行分组，以使其成为一组，并保留在替换版本中。

因此，代码可能是：

import re

test = '''custom_log("some text")
custom_log("lorem ipsum can be anything")
some more text
custom_log(some text)
custom_log("lorem ipsum"
           "can be anything") 
some more text
custom_log(some text)
custom_log("""lorem ipsum
           can be anything""")
some more text'''

search = 'lorem ipsum can be anything'

# we look for 'custom_log('' followed by our search text followed by ')'
words = search.split()
search_re = r'custom_log(\("+'  + r'''[\n "]+'''.join(words) + r'"+\))'
# Our regex will be: 
# custom_log(\("+lorem[\n "]+ipsum[\n "]+can[\n "]+be[\n "]+anything"+\))

print(re.sub(search_re, r'log.info\1.base', test) + "\n")

输出：

custom_log("some text")
log.info("lorem ipsum can be anything").base
some more text
custom_log(some text)
log.info("lorem ipsum"
           "can be anything").base 
some more text
custom_log(some text)
log.info("""lorem ipsum
           can be anything""").base
some more text

Answer 3

我不确定这是否可以解决您的问题，但这听起来像是。以我理解您的问题的方式，您想用替换字符串替换任何与某个正则表达式匹配的字符串。如果是这样应该解决

# Solution for https://stackoverflow.com/questions/58979795/find-a-string-in-a-file-based-on-a-pattern-and-replace-it-with-something-else
# Import python regex module
import re


def replace_by_pattern(pattern: str, contents: str, replacement: str) -> str:
    matches = re.findall(pattern, contents) # Get all strings that match the pattern

    # Loop through all matches
    for match in matches:
        # Replace the first substring match with the replacement
        contents = contents.replace(match, replacement, 1)

    # Return the filtered strings
    return contents


# Define a test string
text = """
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, 
sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam
"""

# Print out the result
print(replace_by_pattern(', ', text, ' - '))

我希望这可以解决您的问题！

Answer 4

import re
re.sub(pattern,replacement,data,flags=re.DOTALL|re.MULTILINE)

pattern = regex模式

replacement =新的替换字符串

数据=原始数据

Answer 5

我曾经必须做同样的事情，而且我是这样做的：

import os
file = open('file_path', "r")
data = file.read()
file.close()

data = data.replace('the pattern you are looking for', 'the pattern you want instead')

file = open('file_path',"w")
file.write(data)
file.close()

我建议您先覆盖其他文件，然后再覆盖同一文件以测试其是否正常工作。如果文件真的很大，那会很慢，在这种情况下，您可以将他的文件拆分成多个文件，或者以相同的想法在气泡中进行readlines（）

data = f.readlines()
data = data.replace('...','...')
f.write(data)

如果使用第二个选项，则应仅在一个气泡中使用它，以免丢失指针。希望对您有所帮助：）

编辑：

从下面的评论中可以看出，问题可能出在模式位于不同的行中，添加此行将删除剪切模式的行跳转

data.replace('\n','')#<------
data.replace('the pattern you are looking for', 'the pattern you want instead')

Answer 6

要检测和替换文本中的多次出现，您可能需要的功能是：Regex .sub()

import re

new_text = None
with open(filepath, mode="r") as f:
  text = f.read()
  text = text.replace('\n', ' ')  # get rid of line jumps
  new_text = re.sub("pattern", "replacement string", text)

根据模式在文件中查找字符串，然后用其他字符串替换

6 个答案: