下面有一个数组,其中包含重复的字符串。我想找到并替换这些字符串,但每次匹配时我都想更改替换字符串的值。
让我演示一下。
此样本数组:
SampleArray = ['champ', 'king', 'king', 'mak', 'mak', 'mak']
应该改为:
SampleArray = ['champ', 'king1', 'king2', 'mak1', 'mak2', 'mak3']
如何实现这一目标?我已经在这3天了,没有运气。提前谢谢。
My Failed Code:
import os, collections, re
SampleArray = ['champ', 'king', 'king', 'mak', 'mak', 'mak']
dupes = [x for x, y in collections.Counter(SampleArray).items() if y > 1]
length = len(dupes)
count = 0
while count < length:
j = 0
instances = SampleArray.count(dupes[count])
while j < instances:
re.sub(dupes[count], dupes[count] + j, SampleArray, j)
j += 1
count += 1
print SampleArray
print ''; os.system('pause')
答案 0 :(得分:5)
我会使用collections.Counter:
from collections import Counter
numbers = {
word: iter([""] if count == 1 else xrange(1, count + 1))
for word, count in Counter(sample).items()
}
result = [
word + str(next(numbers[word]))
for word in sample
]
这不要求以任何方式对列表进行排序或分组。
此解决方案使用iterators生成序号:
首先,我们计算列表中每个单词出现的次数(Counter(sample)
)。
然后我们创建一个字典numbers
,对于每个单词,它包含其“编号”迭代器iter(...)
。如果单词只出现count==1
一次,则此迭代器将返回(“yield”)一个空字符串,否则它将产生范围从1到count [""] if count == 1 else xrange(1, count + 1)
的连续数字。
最后,我们再次遍历列表,并且,对于每个单词,从其自己的编号迭代器next(numbers[word])
中选择下一个值。由于我们的迭代器返回数字,我们必须将它们转换为字符串str(...)
。
答案 1 :(得分:2)
groupby
是将重复项分组的便捷方式:
>>> from itertools import groupby
>>> FinalArray = []
>>> for k, g in groupby(SampleArray):
# g is an iterator, so get a list of it for further handling
items = list(g)
# If only one item, add it unchanged
if len(items) == 1:
FinalArray.append(k)
# Else add index at the end
else:
FinalArray.extend([j + str(i) for i, j in enumerate(items, 1)])
>>> FinalArray
['champ', 'king1', 'king2', 'mak1', 'mak2', 'mak3']
答案 2 :(得分:0)
修改强>
计数器和排序比较简单:
L = ['champ', 'king', 'king', 'mak', 'mak', 'mak']
counts = Counter(L)
res = []
for word in sorted(counts.keys()):
if counts[word] == 1:
res.append(word)
else:
res.extend(['{}{}'.format(word, index) for index in
range(1, counts[word] + 1)])
所以这个
['champ', 'mak', 'king', 'king', 'mak', 'mak']
也给出了:
['champ', 'king1', 'king2', 'mak1', 'mak2', 'mak3']
答案 3 :(得分:0)
一种方法是将数组转换为如下字典:
SampleDict = {}
for key in SampleArray:
if key in SampleDict:
SampleDict[key][0] = True # means: duplicates
SampleDict[key][1] += 1
else:
SampleDict[key] = [False, 1] # means: no duplicates
现在您可以轻松地将该dict转换回数组。但是,如果SampleArray
中的订单很重要,那么您可以这样做:
for i in range(len(SampleArray)):
key = SampleArray[i]
counter = SampleDict[key]
if index[0]:
SampleArray[i] = key + str(counter[1])
counter[1] -= 1
然而,这将给你相反的顺序,即
SampleArray = ['champ', 'king2', 'king1', 'mak3', 'mak2', 'mak1']
但我相信你能够根据自己的需要调整它。
答案 4 :(得分:0)
假设您希望对数组进行排序:
import collections
counter = collections.Counter(SampleArray)
res = []
for key in sorted(counter.keys()):
if counter[key] == 1:
res.append(key)
else:
res.extend([key+str(i) for i in range(1, counter[key]+1)])
>>> res
['champ', 'king1', 'king2', 'mak1', 'mak2', 'mak3']
答案 5 :(得分:0)
f = ['champ', 'king', 'king', 'mak', 'mak', 'mak']
fields_out = [x + str(f.count(x) - f[i + 1:].count(x)) for i, x in enumerate(f)]
print(fields_out)
>>['champ1', 'king1', 'king2', 'mak1', 'mak2', 'mak3']
或
fields_out = [(x if i == f.index(x) else x + str(f.count(x) - f[i + 1:].count(x))) for i, x in enumerate(f)]
print(fields_out)
>>['champ', 'king', 'king2', 'mak', 'mak2', 'mak3']