用顺序名称替换字符串?

时间:2019-09-16 04:29:28

标签: python regex substitution

我想做的事情比解释容易出现。假设我有一个像这样的字符串:

The ^APPLE is a ^FRUIT

使用正则表达式re.sub(),我想要得到这个:

The ^V1 is a ^V2

查看它们如何递增。但现在来了更困难的情况:

The ^X is ^Y but ^X is not ^Z

应翻译为:

The ^V1 is ^V2 but ^V1 is not ^V3

即如果重复,则保留替换,即^ X => ^ V1大小写。

我听说替换可以是一个函数,但无法正确完成。

https://www.hackerrank.com/challenges/re-sub-regex-substitution/problem

4 个答案:

答案 0 :(得分:3)

IIUC,您不需要re。字符串操作将完成这项工作:

from collections import defaultdict

def sequential(str_):
    d = defaultdict(int)
    tokens = str_.split()
    for i in tokens:
        if i.startswith('^') and i not in d:
            d[i] = '^V%s' % str(len(d) + 1)
    return ' '.join(d.get(i, i) for i in tokens)

输出:

sequential('The ^APPLE is a ^FRUIT')
# 'The ^V1 is a ^V2'

sequential('The ^X is ^Y but ^X is not ^Z')
# 'The ^V1 is ^V2 but ^V1 is not ^V3'

答案 1 :(得分:1)

经过一番搜索后,发现有一种使用re模块和dict.setdefault进行替换的解决方案,如果您的条款可以包含数字,请使用以下模式'\^\w[\w\d]*'

import re

string = 'The ^X is ^Y but ^X is not ^Z'
terms = {}
print(re.sub('\^\w+', lambda match: terms.setdefault(match.group(0), '^V{}'.format(len(terms)+1)), string))

输出:

  The ^V1 is ^V2 but ^V1 is not ^V3

sub如果是type类型,请检查替换参数str,用它直接替换匹配项;如果是function,则用{{ 1}}作为参数,并用match替换匹配项。

答案 2 :(得分:1)

您可以创建一个简单的对象来处理增量:

import re
class inc:
   def __init__(self):
      self.a, self.c = {}, 0
   def __getitem__(self, _v):
      if _v not in self.a:
         self.c += 1
         self.a[_v] = self.c
      return self.a[_v]

n = inc()
r = re.sub('(?<=\^)\w+', lambda x:f'V{n[x.group()]}', 'The ^APPLE is a ^FRUIT')

输出:

'The ^V1 is a ^V2'

n = inc()
r = re.sub('(?<=\^)\w+', lambda x:f'V{n[x.group()]}', 'The ^X is ^Y but ^X is not ^Z')

输出:

'The ^V1 is ^V2 but ^V1 is not ^V3'

答案 3 :(得分:0)

我们可以尝试逐个单词地迭代输入的字符串,然后对每个re.sub进行一次^TERM全局替换,使用计数器跟踪我们看到了多少个不同的术语:

inp = "The ^X is ^Y but ^X is not ^Z"
seen = dict()
counter = 0
for term in inp.split():
    if re.match(r'\^([^^]+)', term):
        if term not in seen:
            counter = counter + 1
        seen[term] = 1
        print(term)
        for key, value in seen.iteritems():
            print key, value
        m = re.match(r'\^([^^]+)', term)
        label = "V" + str(counter)
    inp = re.sub(r'\^' + m.group(1), '^' + label, inp)

print(inp)

此打印:

The ^V1 is ^V2 but ^V1 is not ^V3
相关问题