Question

我想计算文件中的特定内容，即"--undefined--"出现的次数。以下是文件内容的一部分：

"jo:ns  76.434
pRE     75.417
zi:     75.178
dEnt    --undefined--
ba      --undefined--

我试着用这样的东西。但它不会起作用：

with open("v3.txt", 'r') as infile:
    data = infile.readlines().decode("UTF-8")

    count = 0
    for i in data:
        if i.endswith("--undefined--"):
            count += 1
    print count

我是否必须实现元组字典来解决这个问题，或者有一个更简单的解决方案呢？

修改

有问题的单词只出现一次。

Answer 1

您可以读取一个字符串中的所有数据并将该字符串拆分为一个列表，并计算该列表中子字符串的出现次数。

with open('afile.txt', 'r') as myfile:
    data=myfile.read().replace('\n', ' ')

data.split(' ').count("--undefined--")

或直接来自字符串：

data.count("--undefined--")

Answer 2

readlines()返回行列表，但它们不会被剥离（即它们包含换行符）。先将它们剥离：

data = [line.strip() for line in data]

或检查--undefined--\n：

if line.endswith("--undefined--\n"):

或者，考虑字符串的.count()方法：

file_contents.count("--undefined--")

Answer 3

或者不要将自己限制在.endswith()，请使用in运算符。

data = ''
count = 0

with open('v3.txt', 'r') as infile:
    data = infile.readlines()
print(data)

for line in data:
    if '--undefined--' in line:
        count += 1

count

Answer 4

引用Raymond Hettinger，＆＃34;必须有一个更好的方式＆＃34;：

from collections import Counter

counter = Counter()
words = ('--undefined--', 'otherword', 'onemore')

with open("v3.txt", 'r') as f:
    lines = f.readlines()
    for line in lines:
        for word in words:
            if word in line:
                counter.update((word,))  # note the single element tuple

print counter

Answer 5

逐行读取文件时，每一行都以换行符结尾：

>>> with open("blookcore/models.py") as f:
...    lines = f.readlines()
... 
>>> lines[0]
'# -*- coding: utf-8 -*-\n'
>>>

因此您的endswith()测试无法正常工作 - 您必须首先删除该行：

if i.strip().endswith("--undefined--"):
    count += 1

现在在内存中读取整个文件往往是一个坏主意 - 即使文件适合内存，它仍然没有充分的理由吃fresources。 Python的file对象是可迭代的，因此您可以循环遍历您的文件。最后，您可以指定在使用codecs模块（python 2）或直接（python3）打开文件（而不是手动解码）时应使用哪种编码：

# py3
with open("your/file.text", encoding="utf-8") as f:

# py2:
import codecs
with codecs.open("your/file.text", encoding="utf-8") as f:

然后只使用内置sum和生成器表达式：

result = sum(line.strip().endswith("whatever") for line in f)

这取决于布尔值是整数，其值为0（False）和1（True）。

计算文件中的特定字符（Python）

5 个答案: