Question

我在Python中有以下列表：
my_list = ['Prix TTC euros : 10,10', 'Prix HT euros 8,42', 'TVA (20.00%) euros : 1,68']

我想获取所有10,10, 8,42和1,68之类的数字，而没有百分比（20.00％）的数字
我的代码：

my_list = ['Prix TTC euros : 10,10', 'Prix HT euros 8,42', 'TVA (20.00%) euros : 1,68']

for item in my_list:
try:
    found = re.search('([+-]?([0-9]*[,.])?[0-9]+)', item).group()
except AttributeError:
    found = None  # apply your error handling
print(found)

它打印：

10,10
8,42
20.00

我正试图逃避最近发现的数字20.00并获得1,68。有什么办法可以逃脱以％结尾的数字或其他解决方案。

Answer 1

有一种避免百分比值与单词边界匹配的方法，该方法将单词匹配与否定的前瞻相否定，该否定的查找将拒绝匹配项后跟%符号：

import re

my_list = ['Prix TTC euros : 10,10', 'Prix HT euros 8,42', 'TVA (20.00%) euros : 1,68']

for item in my_list:
    found = re.search(r'[-+]?\b(?!\d+(?:[,.]\d+)?%)\d+(?:[.,]\d+)?', item)
    if found:
        print(found.group())

请参见Python demo online，输出：['10,10', '8,42', '1,68']。

另请参阅regex demo：

[-+]?-可选的-或+
\b-单词边界
(?!\d+(?:[,.]\d+)?%)-如果有1位以上的数字，则匹配失败的否定超前行为，可选序列.或,，然后是当前字符右边的1位以上的数字位置
\d+-1个以上数字
(?:[.,]\d+)?-.或,的可选序列，然后是1个以上的数字。

Answer 2

让我们从您的正则表达式开始：

found = re.search(r'([+-]?(?:[0-9]*[,.])?[0-9]+)', item).group()

此操作如您所述。我们需要在此正则表达式的末尾添加%作为负向超前

found = re.search(r'([+-]?(?:[0-9]*[,.])?[0-9]+)(?!%)', item).group()

打印：

10,10
8,42
20.0  # <---- note the last digit is missing here

因此，要进一步调整此正则表达式，我们需要排除匹配的整数模式（即([+-]?(?:[0-9]*[,.])?[0-9]+)）（如果它以%结尾）。

因此，我们最终得到：

found = re.search(
    r'([+-]?(?:[0-9]*[,.])?[0-9]+)(?!(?:%|(?:[+-]?(?:[0-9]*[,.])?[0-9]+)))',
    item
).group

给出我们想要的：

10,10
8,42
1,68

Answer 3

与其使用否定的前瞻，不如使用肯定的前瞻，在表达式之前以(?=[^0-9,.%]|$)结尾-“其次是不为%的内容，数字的其他部分或一无所有”。

或者，只需提取[0-9.,%]+的所有序列，然后使用Python丢弃不匹配项。

Python-使用正则表达式查找末尾没有％的数字

3 个答案: