Question

我正在尝试从使用短划线作为分隔符的文本字符串中提取数值，但也指示负值：

"1.3"          # [1.3]
"1.3-2-3.9"    # [1.3, 2, 3.9]
"1.3-2--3.9"   # [1.3, 2, -3.9]
"-1.3-2--3.9"  # [-1.3, 2, -3.9]

目前，我正在手动检查“ - ”序列，但这看起来很丑陋，容易破坏。

def get_values(text):
    return map(lambda s: s.replace('n', '-'), text.replace('--', '-n').split('-'))

我尝试了几种不同的方法，同时使用str.split（）函数和re.findall（），但它们都没有完全奏效。

例如，以下模式应匹配所有有效字符串，但我不确定如何将它与findall一起使用：

r"^-?\d(\.\d*)?(--?\d(\.\d*)?)*$"

有没有一般的方法可以做到这一点，我没有看到？谢谢！

Answer 1

您可以尝试使用此模式与lookbehind进行分割：

(?<=[0-9])-

（以数字开头的连字符）

>>> import re
>>> re.split('(?<=[0-9])-', text)

在这种情况下，您肯定不会在字符串开头之后或其他连字符之后。

Answer 2

@CasimiretHippolyte提供了一个非常优雅的Regex解决方案，但我想指出，只需list comprehension，iter和next，就可以非常简洁地执行此操作：< / p>

>>> def get_values(text):
...    it = iter(text.split("-"))
...    return [x or "-"+next(it) for x in it]
...
>>> get_values("1.3")
['1.3']
>>> get_values("1.3-2-3.9")
['1.3', '2', '3.9']
>>> get_values("1.3-2--3.9")
['1.3', '2', '-3.9']
>>> get_values("-1.3-2--3.9")
['-1.3', '2', '-3.9']
>>>

此外，如果您使用timeit.timeit，您会发现此解决方案比使用正则表达式快得多：

>>> from timeit import timeit
>>>
>>> # With Regex
>>> def get_values(text):
...     import re
...     return re.split('(?<=[0-9])-', text)
...
>>> timeit('get_values("-1.3-2--3.9")', 'from __main__ import get_values')
9.999720634885165
>>>
>>> # Without Regex
>>> def get_values(text):
...     it = iter(text.split("-"))
...     return [x or "-"+next(it) for x in it]
...
>>> timeit('get_values("-1.3-2--3.9")', 'from __main__ import get_values')
4.145546989910741
>>>

（Python）仅在单个分隔符实例上拆分字符串

2 个答案: