我正在编写一个程序来从文本文件中提取数据(New Revision: 39772
)(mbox.txt link - google drive link for file )
我使用普通方法完成了任务,但我希望使用re.findall
方法来完成。
import re
print "Please enter file path only"
text_file = raw_input ("Enter the file name:")
print "Trying to open the file that you have entered"
try:
open_file = open ( text_file )
print "Text file " + text_file + " is opened"
except:
print "File not found"
raise SystemExit
# using normal method
count = 0
total = 0.0
using regular expresion
for line in open_file:
if 'New Revision:' in line:
print line
total += float(line.split()[-1])
count = count + 1
Avg = total/count
print "The number of line with 'New Revision:' is:", count
print "The total of the floating point numbers at the end of the 'New Revision:'is:", total
print "Average:",round(Avg,1)
#using findall()method
numlist = [];
for line in open_file:
line = line.rstrip()
Extract_data = re.findall('^New Revision:([0-9]+)',line)
number = int(Extract_data[0])
numlist.append(Extract_data)
print numlist
我想在New Revision: 39772
末尾提取数字,并使用re.findall
方法将其保存到列表中。
到目前为止,我已阅读本网站上的所有可用文档,但我无法理解如何做到这一点并输出错误。
答案 0 :(得分:1)
使用以下正则表达式
reg = r'^New Revision:\s([0-9]+)'
在使用正则表达式时,缺少空格并使用原始字符串。