Question

我是初学者和学习python。问题是我必须从文件中提取数字（其中数字可以是任何地方。可以在同一行中多次。有些行可能没有数字，有些行可能是新行）并找到它们的总和。我确实知道如何解决它，这是我的代码

import re
new=[]
s=0
fhand=open("sampledata.txt")
for line in fhand:
    if re.search('^.+',line):         #to exclude lines which have nothing
        y=re.findall('([0-9]*)',line) #this part is supposed to extract only the
        for i in range(len(y)):       #the numerical part, but it extracts all the words. why?
            try:
                y[i]=float(y[i])
            except:
                y[i]=0
        s=s+sum(y)
print s

代码有效，但它不是一种pythonic方式。为什么（[0-9] *）提取所有单词而不是仅提取数字？什么是pythonic方式呢？

Answer 1

您的正则表达式有([0-9]*)，它会找到零或更多数字的所有单词。您可能需要([0-9]+)。

Answer 2

你好，你通过添加“*”在正则表达式中犯了一个错误，就像这应该有效：

y=re.findall('([0-9])',line)

Answer 3

扩展wind85的答案，您可能希望根据您希望在文件中找到的数字类型来微调正则表达式。例如，如果您的数字中可能包含小数点，那么您可能需要[0-9]+(?:\.[0-9]+)?之类的内容（一个或多个数字可选地后跟一个句点和一个或多个数字）。

至于使它更加pythonic，这就是我可能会写的：

s=0
for line in open("sampledata.txt"):
    s += sum(float(y) for y in re.findall(r'[0-9]+',line))
print s

如果你想要真正的幻想，你可以把它变成一个单行：

print sum(float(y) for line in open('sampledata.txt') 
                   for y in re.findall(r'[0-9]+',line))

但就我个人而言，我觉得很难读懂。

使用正则表达式从文件中提取数字量并找到总和

3 个答案: