将字符串中的数字转换为`_NUM - * _`符号

时间:2017-08-11 02:30:51

标签: python regex string replace numbers

给出一个带数字的字符串:

sudo cd /usr/local/bin
sudo rm cc gcc c++ g++
sudo ln -s /sw/bin/gcc-fsf-4.8 cc
sudo ln -s /sw/bin/gcc-fsf-4.8 gcc
sudo ln -s /sw/bin/c++-fsf-4.8 c++
sudo ln -s /sw/bin/g++-fsf-4.8 g++

目标是将数字转换为I counted, ' 1 2 3 4 5 5 5 8 9 10 ' 符号,其中_NUM-*_表示数字出现的顺序。例如。给定上面的intpu,所需的输出是:

*

即使重复数字,例如给出输入

"I counted, ' _NUM-1_ _NUM-2_ _NUM-3_ _NUM-4_ _NUM-5_ _NUM-6_ _NUM-7_ _NUM-8_ _NUM-9_ _NUM-10_'"

所需的输出保持数字的顺序忽略数字本身的值,例如:

I said, ' 1 2 3 4 5 5 5 8 9 10 '

我试过了:

"I said, ' _NUM-1_ _NUM-2_ _NUM-3_ _NUM-4_ _NUM-5_ _NUM-6_ _NUM-7_ _NUM-8_ _NUM-9_ _NUM-10_'" 

但它只是用相同的import re s = "I counted, ' 1 2 3 4 5 6 7 8 9 10 '" num_regexp = '(?<!\S)(?=.)(0|([1-9](\d*|\d{0,2}(,\d{3})*)))?(\.\d*[1-9])?(?!\S)' re.sub(num_regexp, '_NUM_', s) 符号替换输出而不保持顺序,即

[OUT]:

_NUM_

我可以执行"I counted, ' _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ _NUM_ '" 个帖子操作并替换每个re.sub,即

_NUM_

[OUT]:

import re

s = "I counted, ' 1 2 3 4 5 6 7 8 9 10 '"
num_regexp = '(?<!\S)(?=.)(0|([1-9](\d*|\d{0,2}(,\d{3})*)))?(\.\d*[1-9])?(?!\S)'

num_counter = 1
tokens = []
for token in re.sub(num_regexp, '_NUM_', s).split():
    if token == '_NUM_':
        token = '_NUM-{}_'.format(num_counter)
        num_counter += 1

    tokens.append(token)

result = ' '.join(tokens)

在没有通用"I counted, ' _NUM-1_ _NUM-2_ _NUM-3_ _NUM-4_ _NUM-5_ _NUM-6_ _NUM-7_ _NUM-8_ _NUM-9_ _NUM-10_ '" 然后进行事后字符串编辑的情况下,是更好的方法来实现所需的输出吗?

0 个答案:

没有答案