输出

Question

这是我输入的句子的一个例子。我想从以毫米或厘米结尾的句子中提取数字。这是我尝试做的正则表达式。

 sen = 'The study reveals a speculated nodule with pleural tagging at anterior basal segment of LLL, measured 1.9x1.4x2.0 cm in size' 

 re.findall(r'(\d+) cm',sen)

这将输出显示为

 ['0']

然后我只是尝试提取没有条件的数字

 print (re.findall('\d+', sen ))

这给出的输出为

 ['1', '9', '1', '4', '2', '0']

我的预期输出是

 ['1.9x1.4x2.0'] or ['1.9', '1.4', '2.0']

不重复，因为我也在寻找厘米，毫米加浮点数的方法。

Answer 1

您可以使用3个捕获组来获取数字，并使用character class确保测量以cm或mm结尾。

(?<!\S)(\d+\.\d+)x(\d+\.\d+)x(\d+\.\d+) [cm]m(?!\S)

部分

(?<!\S)后面是负数，则断言直接在左边的不是非空格字符
(\d+\.\d+)x捕获组1 ，匹配1个以上的数字和小数部分，然后匹配x
(\d+\.\d+)x捕获第2组与上述相同
(\d+.\d+) 捕获第3组匹配1个以上的数字和一个小数部分
[cm]m匹配厘米或毫米
(?!\S)负向查找，断言直接在左边的不是非空格字符

Regex demo | Python demo

例如

import re

regex = r"(?<!\S)(\d+\.\d+)x(\d+\.\d+)x(\d+\.\d+) [cm]m(?!\S)"
test_str = "The study reveals a speculated nodule with pleural tagging at anterior basal segment of LLL, measured 1.9x1.4x2.0 cm in size"

print(re.findall(regex, test_str))

输出

[('1.9', '1.4', '2.0')]

要获取包含x的输出，您可以使用

(?<!\S)(\d+\.\d+x\d+\.\d+x\d+\.\d+) [cm]m(?!\S)

Regex demo | Python demo

输出

['1.9x1.4x2.0']

修改

要仅匹配值，并在数字和值之间留出1个或多个空格，可以使用正向前瞻：

\d+(?:\.\d+)?(?:(?:x\d+(?:\.\d+)?)*)?(?=[ \t]+[cm]m)

Regex

Answer 2

您可以在re.findall中使用前瞻：

import re
sen = 'The study reveals a speculated nodule with pleural tagging at anterior basal segment of LLL, measured 1.9x1.4x2.0 cm in size' 
result = re.findall(r'[\dx\.]+(?=\scm)', sen)

输出：

['1.9x1.4x2.0']

Answer 3

尝试一下：

sen = 'The study reveals a speculated nodule with pleural tagging at anterior basal segment of LLL, measured 1.9x1.4x2.0 cm in size' 
import re
re.findall('\d+\.\d+', sen)

输出：

['1.9', '1.4', '2.0']

Answer 4

这里有另一种方法：

import re
sen = 'The study reveals a speculated nodule with pleural tagging at anterior basal segment of LLL, measured 1.9x1.4x2.0 cm in size' 
output = re.findall('\d.\d', sen)

输出：

['1.9', '1.4', '2.0']

Answer 5

import re    
sen = '''The study reveals a speculated nodule with pleural tagging at anterior basal 
segment of LLL, measured 1.9x1.4x2.0 cm in size'''

print (re.findall('[\d\.]+', sen ))

输出

['1.9', '1.4', '2.0']

如何从python中具有特定条件的句子中提取数字？

5 个答案:

输出