Python:使用渐进式数字重命名列表中的重复项而不排序列表

时间:2015-06-04 17:33:11

标签: python list duplicates rename

给出这样一个列表:

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]

我想通过附加一个数字来重命名重复项,以获得以下结果:

mylist = ["name1", "state", "name2", "city", "name3", "zip1", "zip2"]

我不想更改原始列表的顺序。针对此related Stack Overflow question建议的解决方案对列表进行排序,我不想这样做。

7 个答案:

答案 0 :(得分:12)

我的解决方案maplambda

print map(lambda x: x[1] + str(mylist[:x[0]].count(x[1]) + 1) if mylist.count(x[1]) > 1 else x[1], enumerate(mylist))

更传统的形式

newlist = []
for i, v in enumerate(mylist):
    totalcount = mylist.count(v)
    count = mylist[:i].count(v)
    newlist.append(v + str(count + 1) if totalcount > 1 else v)

最后一个

[v + str(mylist[:i].count(v) + 1) if mylist.count(v) > 1 else v for i, v in enumerate(mylist)]

答案 1 :(得分:11)

我就是这样做的。

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]

from collections import Counter # Counter counts the number of occurrences of each item
counts = Counter(mylist) # so we have: {'name':3, 'state':1, 'city':1, 'zip':2}
for s,num in counts.items():
    if num > 1: # ignore strings that only appear once
        for suffix in range(1, num + 1): # suffix starts at 1 and increases by 1 each time
            mylist[mylist.index(s)] = s + str(suffix) # replace each appearance of s

编辑:这是一个单行,但订单不保留。

[s + str(suffix) if num>1 else s for s,num in Counter(mylist).items() for suffix in range(1, num+1)]
# Produces: ['zip1', 'zip2', 'city', 'state', 'name1', 'name2', 'name3']

答案 2 :(得分:5)

由于countO(n^2),因此在每个元素上调用count的任何方法都将导致O(n)。你可以这样做:

# not modifying original list
from collections import Counter

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
counts = {k:v for k,v in Counter(mylist).items() if v > 1}
newlist = mylist[:]

for i in reversed(range(len(mylist))):
    item = mylist[i]
    if item in counts and counts[item]:
        newlist[i] += str(counts[item])
        counts[item]-=1
print(newlist)

# ['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']
# modifying original list
from collections import Counter

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
counts = {k:v for k,v in Counter(mylist).items() if v > 1}      

for i in reversed(range(len(mylist))):
    item = mylist[i]
    if item in counts and counts[item]:
        mylist[i] += str(counts[item])
        counts[item]-=1
print(mylist)

# ['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']

这应该是O(n)

其他提供的答案:

每个元素

mylist.index(s)会导致O(n^2)

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]

from collections import Counter
counts = Counter(mylist)
for s,num in counts.items():
    if num > 1:
        for suffix in range(1, num + 1):
            mylist[mylist.index(s)] = s + str(suffix) 
每个元素

count(x[1])会导致O(n^2)
每个元素也会多次使用它以及列表切片。

print map(lambda x: x[1] + str(mylist[:x[0]].count(x[1]) + 1) if mylist.count(x[1]) > 1 else x[1], enumerate(mylist))

基准:

http://nbviewer.ipython.org/gist/dting/c28fb161de7b6287491b

答案 3 :(得分:4)

这是一个非常简单的O(n)解决方案。只需遍历列表中存储元素索引的列表即可。如果我们之前看过这个元素,请先使用存储的数据来附加出现值。

这种方法只需再创建一个字典进行回顾即可解决问题。避免进行预测,这样我们就不会创建临时列表切片。

mylist = ["name", "state", "name", "city", "city", "name", "zip", "zip", "name"]

dups = {}

for i, val in enumerate(mylist):
    if val not in dups:
        # Store index of first occurrence and occurrence value
        dups[val] = [i, 1]
    else:
        # Special case for first occurrence
        if dups[val][1] == 1:
            mylist[dups[val][0]] += str(dups[val][1])

        # Increment occurrence value, index value doesn't matter anymore
        dups[val][1] += 1

        # Use stored occurrence value
        mylist[i] += str(dups[val][1])

print mylist

# ['name1', 'state', 'name2', 'city1', 'city2', 'name3', 'zip1', 'zip2', 'name4']

答案 4 :(得分:2)

Rick Teachey answer的列表理解版本,“双线”:

from collections import Counter

m = ["name", "state", "name", "city", "name", "zip", "zip"]

d = {a:list(range(1, b+1)) if b>1 else '' for a,b in Counter(m).items()}
[i+str(d[i].pop(0)) if len(d[i]) else i for i in m]
#['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']

答案 5 :(得分:1)

您可以使用哈希表来解决此问题。定义字典d。 key是字符串,值是(first_time_index_in_the_list,times_of_appearance)。每次看到单词时,只需检查字典,如果值为2,则使用first_time_index_in_the_list将“1”附加到第一个元素,并将times_of_appearance附加到当前元素。如果大于2,只需将times_of_appearance附加到当前元素。

答案 6 :(得分:1)

不那么花哨的东西。

from collections import defaultdict
mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
finalList = []
dictCount = defaultdict(int)
anotherDict = defaultdict(int)
for t in mylist:
   anotherDict[t] += 1
for m in mylist:
   dictCount[m] += 1
   if anotherDict[m] > 1:
       finalList.append(str(m)+str(dictCount[m]))
   else:
       finalList.append(m)
print finalList