Question

我正在尝试从文本文件中的行中提取科学数字。像

这样的东西

示例：

str = 'Name of value 1.111E-11   Next Name 444.4'

结果：

[1.111E-11, 444.4]

我在其他帖子中尝试过解决方案，但看起来只适用于整数（可能）

>>> [int(s) for s in str.split() if s.isdigit()]
[]

float（）可以工作，但每次使用字符串时都会出错。

>>> float(str.split()[3])
1.111E-11
>>> float(str.split()[2])
ValueError: could not convert string to float: value

先谢谢你的帮助!!

Answer 1

这可以使用正则表达式完成：

import re
s = 'Name of value 1.111E-11   Next Name 444.4'
match_number = re.compile('-?\ *[0-9]+\.?[0-9]*(?:[Ee]\ *-?\ *[0-9]+)?')
final_list = [float(x) for x in re.findall(match_number, s)]
print final_list

输出：

[1.111e-11, 444.4]

请注意，我上面写的模式取决于小数点左边至少有一个数字。

修改

这里是a tutorial and reference我发现学习如何编写正则表达式模式很有帮助。

由于您要求解释正则表达式模式：

'-?\ *[0-9]+\.?[0-9]*(?:[Ee]\ *-?\ *[0-9]+)?'

一次一件：

-? optionally matches a negative sign (zero or one negative signs) \ * matches any number of spaces (to allow for formatting variations like - 2.3 or -2.3) [0-9]+ matches one or more digits \.? optionally matches a period (zero or one periods) [0-9]* matches any number of digits, including zero (?: ... ) groups an expression, but without forming a "capturing group" (look it up) [Ee] matches either "e" or "E" \ * matches any number of spaces (to allow for formats like 2.3E5 or 2.3E 5) -? optionally matches a negative sign \ * matches any number of spaces [0-9]+ matches one or more digits ? makes the entire non-capturing group optional (to allow for the presence or absence of the exponent - 3000 or 3E3

注意：\ d是[0-9]的快捷方式，但我已经习惯使用[0-9]了。

Answer 2

您可以随时使用for循环和try-except语句。

>>> string = 'Name of value 1.111E-11   Next Name 444.4'
>>> final_list = []
>>> for elem in string.split():
        try:
            final_list.append(float(elem))
        except ValueError:
            pass


>>> final_list
[1.111e-11, 444.4]

Answer 3

我使用正则表达式：

import re
s = 'Name of value 1.111E-11   Next Name 444.4'
print [float(x) for x in re.findall("-?\d+.?\d*(?:[Ee]-\d+)?", s)]

输出：

[1.111e-11, 444.4]

从字符串中提取科学数字

3 个答案: