Question

我会回答任何问题

基本上我有超过500个文件中我要查找的70个单词的列表，我需要用新的单词和数字替换它们。

ie ...找到“hello”并替换为“hello 233.4”但70个单词/数字和500多个文件。

我在这里发现了一篇内容丰富的帖子，但我一直在阅读有关sys.argv，re，s，搜索，替换等等等等等等等。我无法理解这段代码。我一直在使用scriptname.py“-i”和“-o”从Windows 7的“cmd”窗口“调用”（我认为）...

如果有人可以将示例输入搜索列表路径“c：/input/file/path/searchlist.txt”和要搜索的文件的示例路径“c：/search/this/file/searchme.txt”请在正确的位置！（我会尝试让它重复我自己的文件夹中的每个文件，并自己突出显示或加粗替换。）

我尝试了很多组合......我可以完成我所做的每一次修改，并且可以键入天/页/天/页面......每一天/页面每次都会变得笨拙和笨拙！

谢谢...或者如果你知道不同的方式，请建议。

这是原帖的链接：

Use Python to search one .txt file for a list of words or phrases (and show the context)

这是原帖的代码：

import re
import sys

def main():
  if len(sys.argv) != 3:
    print("Usage: %s fileofstufftofind filetofinditin" % sys.argv[0])
    sys.exit(1)

  with open(sys.argv[1]) as f:
    patterns = [r'\b%s\b' % re.escape(s.strip()) for s in f]
  there = re.compile('|'.join(patterns))

  with open(sys.argv[2]) as f:
    for i, s in enumerate(f):
      if there.search(s):
        print("Line %s: %r" % (i, s))

main()

Answer 1

您在上面发布的代码可能很复杂，因为您需要完成作业。或许像下面这样更简单的东西更容易理解：

# example variables
word_mapping = [['horse', 'donkey'], ['left', 'right']]
filename = 'C:/search/this/file/searchme.txt'

# load the text from the file with 'r' for "reading"
file = open(filename, 'r')
text = file.read()
file.close()

# replace words in the text
for find_word, replacement in word_mapping:
    text = text.replace(find_word, replacement)

# save the modified text to the file, 'w' for "writing"
file = open(filename, 'w')
file.write(text)
file.close()

要加载要替换的单词列表，您可以执行以下操作：

words_path = 'C:/input/file/path/searchlist.txt'
with open(words_path) as f:
    word_mapping = [line.split() for line in f]

默认情况下，

str.split()在空格（空格，制表符）上拆分字符串，但您可以拆分其他字符甚至“单词”。如果你有一个以逗号分隔的文件，你使用line.split(',')并在逗号分隔。

作为您在上面发布的代码的解释..发生了几件不同的事情，所以让我们分解几件。

if len(sys.argv) != 3:
    print("Usage: %s fileofstufftofind filetofinditin" % sys.argv[0])
    sys.exit(1)

此特定脚本接收单词列表和目标文件的路径作为命令行参数，因此您可以将此脚本作为python script_name.py wordslist_file target_file运行。换句话说，您不会在脚本中硬编码文件路径，而是让用户在运行时提供它们。

代码的第一部分通过检查sys.argv的长度来检查已将多少命令行参数传递给脚本，sys.argv[0]是包含命令行参数作为字符串的列表。当命令行参数的数量不等于3时，将打印错误消息。第一个（或第零个）参数是脚本的文件名，因此这就是为什么with open(sys.argv[1]) as f: patterns = [r'\b%s\b' % re.escape(s.strip()) for s in f] there = re.compile('|'.join(patterns))作为错误消息的一部分打印的原因。

sys.argv[1]

这将打开一个包含单词（文件名等于with open(sys.argv[2]) as f: for i, s in enumerate(f): if there.search(s): print("Line %s: %r" % (i, s))）的文件，并为它们编译正则表达式对象。正则表达式使您可以更好地控制匹配的单词，但它有自己的“迷你语言”，如果您没有相关经验，可能会非常混乱。请注意，此脚本仅查找单词而不替换它们，因此使用单词的文件每行只包含一个“单词”。

sys.argv[2]

这将打开目标文件（第二个命令行参数{{1}}中的文件名并循环遍历该文件中的行。如果一行包含单词列表中的单词，则打印整行。

Answer 2

也许可以试试这个...... Find all files in a directory with extension .txt in Python

将所有500个文件放在同一目录中并从那里处理。

python 3.3在500个文件中搜索70个单词并替换它们

2 个答案: