Question

我需要一个程序在文件（P）中找到一个字符串（S），并返回它在文件中出现的数量，为此我决定创建一个函数：

def file_reading(P, S):
  file1= open(P, 'r')
  pattern = S
  match1 = "re.findall(pattern, P)"
    if match1 != None:
      print (pattern)

我知道它看起来不太好，但由于某种原因，它没有输出任何东西，更不用说正确答案了。

Answer 1

您的代码存在多个问题。

首先，调用open()返回一个文件对象。它不读取文件的内容。为此，您需要使用read()或遍历文件对象。

其次，如果您的目标是计算字符串匹配的数量，那么您就不需要正则表达式。您可以使用字符串函数count()。即使如此，将正则表达式调用放在引号中也没有意义。

match1 = "re.findall(pattern, file1.read())"

将字符串"re.findall(pattern, file1.read())"分配给变量match1。

这是一个适合您的版本：

def file_reading(file_name, search_string):
    # this will put the contents of the file into a string
    file1 = open(file_name, 'r')
    file_contents = file1.read()
    file1.close()  # close the file

    # return the number of times the string was found
    return file_contents.count(search_string)

Answer 2

有一些错误;让我们逐一介绍它们：

引号中的任何内容都是字符串。将"re.findall(pattern, file1.read())"放在引号中只会生成一个字符串。如果你真的想调用re.findall函数，则不需要引号：）
你检查match1是否为无，这真的很棒，但是你应该返回那些匹配，而不是初始模式。
if语句不应缩进。

此外：

打开文件后，请务必关闭文件！由于大多数人忘记这样做，最好使用with open(filename, action) syntax。

所以，总之，它看起来像这样（为了清楚起见，我改变了一些变量名称）：

def file_reading(input_file, pattern):
    with open(input_file, 'r') as text_file:
        data = text_file.read()
        matches = re.findall(pattern, data)

        if matches:
            print(matches)  # prints a list of all strings found

Answer 3

您可以逐行阅读而不是阅读整个文件，找到重复模式的时间，并将其添加到总计数c

def file_reading(file_name, pattern):
  c = 0
  with open(file_name, 'r') as f:
    for line in f:
      c + = line.count(pattern)
  if c: print c

在Python中查找文本文件中的字符串

3 个答案: