Question

我正在尝试使用Python计算文件中单词的出现次数。但是我必须忽略文件中的评论。

我有这样的功能：

def getWordCount(file_name, word):
  count = file_name.read().count(word)
  file_name.seek(0)
  return count

如何忽略该行以#开头的位置？

我知道这可以通过逐行读取文件来完成，如this question中所述。有没有更快，更蟒蛇的方式呢？

Answer 1

你可以做一件事只是创建一个没有注释行的文件然后运行你的代码Ex。

infile = file('./file_with_comment.txt')

newopen = open('./newfile.txt', 'w')
for line in infile :
    li=line.strip()
    if not li.startswith("#"):
        newopen.write(line)

newopen.close()

这将删除以＃开头的每一行，然后在newfile.txt

上运行您的函数

def getWordCount(file_name, word):
  count = file_name.read().count(word)
  file_name.seek(0)
  return count

Answer 2

更多Python将是这样的：

def getWordCount(file_name, word):
  with open(file_name) as wordFile:
    return sum(line.count(word)
      for line in wordFile
      if not line.startswith('#'))

更快（与Pythonian无关）可以将整个文件读入一个字符串，然后使用正则表达式查找不在以散列开头的行中的单词。

Answer 3

您可以使用正则表达式过滤掉评论：

import re

text = """ This line contains a word. # empty
This line contains two: word word  # word
newline
# another word
"""

filtered = ''.join(re.split('#.*', text))
print(filtered)
#  This line contains a word. 
# This line contains two: word word  
# newline

print(text.count('word'))  # 5
print(filtered.count('word'))  # 3

只需将text替换为file_name.read()。

从文件中获取字数，忽略python

3 个答案: