Question

我正在编写一个代码，我想从mbox.text文件中搜索术语“X-DSPAM-Confidence：0.8475”。到目前为止，我可以搜索字符串并计算它在文件中出现的次数。现在的问题是，每次它出现在文本文件中时，我都要添加该字符串的结束数字（此处为0.8475）。我需要帮助，因为我卡在那里，无法计算浮点数的总和出现在该字符串的末尾。

我的文件内容如下：

X-Content-Type-Message-Body: text/plain; charset=UTF-8
Content-Type: text/plain; charset=UTF-8
X-DSPAM-Result: Innocent
X-DSPAM-Processed: Sat Jan  5 09:14:16 2008
X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000

我的代码：

text_file = raw_input ("please enter the path of the file that you want to          open:")
open_file = open ( text_file )
print "Text file has been open " 
count = 0
total = 0.00000
for line in open_file:
    if 'X-DSPAM-Confidence:' in line:
        total =+ float(line[20:])
        count = count + 1
print total/count
print "The number of line with X-DSPAM-Confidence: is:", count

我该怎么做？

Answer 1

切片返回一个列表而不是一个值，就地运算符用于添加+=而不是=+。话虽如此，你应该使用split。

total = 0.00000
for line in open_file:
    if 'X-DSPAM-Confidence:' in line:
        total += float(line.split()[-1]) # change here.
        count = count + 1
print total/count

甚至可以更好地使用sum和len。

with open('test.txt') as f:
    data = [float(line.split()[-1]) for line in f if line.strip().startswith('X-DSPAM-Confidence:')]
    print(sum(data)/len(data))

使用mean模块中的statistics的Python 3.4或更新的解决方案。

from statistics import mean


with open('test.txt') as f:
    data = [float(line.split()[-1]) for line in f if line.strip().startswith('X-DSPAM-Confidence:')]
    print(mean(data))

Answer 2

print声明，就像一个神奇的8球，告诉所有

>>> print repr(line[20:])
' 0.0000\n'

你可以选择比float更多的位置。把它缩小一点

total += float(line[21:-1])

在python中切片和切割文本文件

2 个答案: