我想做的事情比解释容易出现。假设我有一个像这样的字符串:
The ^APPLE is a ^FRUIT
使用正则表达式re.sub(),我想要得到这个:
The ^V1 is a ^V2
查看它们如何递增。但现在来了更困难的情况:
The ^X is ^Y but ^X is not ^Z
应翻译为:
The ^V1 is ^V2 but ^V1 is not ^V3
即如果重复,则保留替换,即^ X => ^ V1大小写。
我听说替换可以是一个函数,但无法正确完成。
https://www.hackerrank.com/challenges/re-sub-regex-substitution/problem
答案 0 :(得分:3)
IIUC,您不需要re
。字符串操作将完成这项工作:
from collections import defaultdict
def sequential(str_):
d = defaultdict(int)
tokens = str_.split()
for i in tokens:
if i.startswith('^') and i not in d:
d[i] = '^V%s' % str(len(d) + 1)
return ' '.join(d.get(i, i) for i in tokens)
输出:
sequential('The ^APPLE is a ^FRUIT')
# 'The ^V1 is a ^V2'
sequential('The ^X is ^Y but ^X is not ^Z')
# 'The ^V1 is ^V2 but ^V1 is not ^V3'
答案 1 :(得分:1)
经过一番搜索后,发现有一种使用re
模块和dict.setdefault
进行替换的解决方案,如果您的条款可以包含数字,请使用以下模式'\^\w[\w\d]*'
:
import re
string = 'The ^X is ^Y but ^X is not ^Z'
terms = {}
print(re.sub('\^\w+', lambda match: terms.setdefault(match.group(0), '^V{}'.format(len(terms)+1)), string))
输出:
The ^V1 is ^V2 but ^V1 is not ^V3
sub
如果是type
类型,请检查替换参数str
,用它直接替换匹配项;如果是function
,则用{{ 1}}作为参数,并用match
替换匹配项。
答案 2 :(得分:1)
您可以创建一个简单的对象来处理增量:
import re
class inc:
def __init__(self):
self.a, self.c = {}, 0
def __getitem__(self, _v):
if _v not in self.a:
self.c += 1
self.a[_v] = self.c
return self.a[_v]
n = inc()
r = re.sub('(?<=\^)\w+', lambda x:f'V{n[x.group()]}', 'The ^APPLE is a ^FRUIT')
输出:
'The ^V1 is a ^V2'
n = inc()
r = re.sub('(?<=\^)\w+', lambda x:f'V{n[x.group()]}', 'The ^X is ^Y but ^X is not ^Z')
输出:
'The ^V1 is ^V2 but ^V1 is not ^V3'
答案 3 :(得分:0)
我们可以尝试逐个单词地迭代输入的字符串,然后对每个re.sub
进行一次^TERM
全局替换,使用计数器跟踪我们看到了多少个不同的术语:
inp = "The ^X is ^Y but ^X is not ^Z"
seen = dict()
counter = 0
for term in inp.split():
if re.match(r'\^([^^]+)', term):
if term not in seen:
counter = counter + 1
seen[term] = 1
print(term)
for key, value in seen.iteritems():
print key, value
m = re.match(r'\^([^^]+)', term)
label = "V" + str(counter)
inp = re.sub(r'\^' + m.group(1), '^' + label, inp)
print(inp)
此打印:
The ^V1 is ^V2 but ^V1 is not ^V3