使用re.findall方法提取数据

时间:2016-04-13 13:30:11

标签: python python-2.7

我正在编写一个程序来从文本文件中提取数据(New Revision: 39772)(mbox.txt link - google drive link for file

我使用普通方法完成了任务,但我希望使用re.findall方法来完成。

import re
print "Please enter file path only"
text_file = raw_input ("Enter the file name:")
print "Trying to open the file that you have entered"
try:
    open_file = open ( text_file )
    print "Text file " + text_file + " is opened"

except:
    print "File not found"
    raise SystemExit
# using normal method     
count = 0
total = 0.0
using regular expresion
for line in open_file: 
    if 'New Revision:' in line:   
        print line
        total += float(line.split()[-1])
        count = count + 1
        Avg = total/count
print "The number of line with 'New Revision:' is:", count
print "The total of the floating point numbers at the end of the 'New   Revision:'is:", total
print "Average:",round(Avg,1)

#using findall()method 

numlist = [];
for line in open_file:
   line = line.rstrip()
   Extract_data = re.findall('^New Revision:([0-9]+)',line)
   number = int(Extract_data[0])
   numlist.append(Extract_data)

print numlist

我想在New Revision: 39772末尾提取数字,并使用re.findall方法将其保存到列表中。 到目前为止,我已阅读本网站上的所有可用文档,但我无法理解如何做到这一点并输出错误。

1 个答案:

答案 0 :(得分:1)

使用以下正则表达式

reg = r'^New Revision:\s([0-9]+)'

在使用正则表达式时,缺少空格并使用原始字符串。