在python中切片和切割文本文件

时间:2016-03-10 05:47:10

标签: python

我正在编写一个代码,我想从mbox.text文件中搜索术语“X-DSPAM-Confidence:0.8475”。到目前为止,我可以搜索字符串并计算它在文件中出现的次数。现在的问题是,每次它出现在文本文件中时,我都要添加该字符串的结束数字(此处为0.8475)。我需要帮助,因为我卡在那里,无法计算浮点数的总和出现在该字符串的末尾。

我的文件内容如下:

X-Content-Type-Message-Body: text/plain; charset=UTF-8
Content-Type: text/plain; charset=UTF-8
X-DSPAM-Result: Innocent
X-DSPAM-Processed: Sat Jan  5 09:14:16 2008
X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000

我的代码:

text_file = raw_input ("please enter the path of the file that you want to          open:")
open_file = open ( text_file )
print "Text file has been open " 
count = 0
total = 0.00000
for line in open_file:
    if 'X-DSPAM-Confidence:' in line:
        total =+ float(line[20:])
        count = count + 1
print total/count
print "The number of line with X-DSPAM-Confidence: is:", count

我该怎么做?

2 个答案:

答案 0 :(得分:0)

切片返回一个列表而不是一个值,就地运算符用于添加+=而不是=+。话虽如此,你应该使用split

total = 0.00000
for line in open_file:
    if 'X-DSPAM-Confidence:' in line:
        total += float(line.split()[-1]) # change here.
        count = count + 1
print total/count

甚至可以更好地使用sumlen

with open('test.txt') as f:
    data = [float(line.split()[-1]) for line in f if line.strip().startswith('X-DSPAM-Confidence:')]
    print(sum(data)/len(data))

使用mean模块中的statistics的Python 3.4或更新的解决方案。

from statistics import mean


with open('test.txt') as f:
    data = [float(line.split()[-1]) for line in f if line.strip().startswith('X-DSPAM-Confidence:')]
    print(mean(data))

答案 1 :(得分:0)

print声明,就像一个神奇的8球,告诉所有

>>> print repr(line[20:])
' 0.0000\n'

你可以选择比float更多的位置。把它缩小一点

total += float(line[21:-1])