列表处理重复项

时间:2019-02-23 05:39:31

标签: python string list function count

所以我在列表中有一些要处理的元素,基本上我希望做到这一点:

input:
my_list = ['Gold Trophy (January)', 'Gold Trophy (February)', 'Bronze Trophy 
(March)']

output:
['Gold Trophy x2', 'Bronze Trophy (March)']

当有重复的公用字符串时(例如在Gold Trophy的情况下),我要删除这两个元素,并形成一个新元素,内容为(Gold Trophy x(重复数量))

这是我到目前为止所拥有的:

my_list = ['Gold Trophy (January)', 'Gold Trophy (February)', 'Bronze Trophy 
(March)']

# function to count how many duplicates
def countX(my_list, myString): 
    count = 0
    for ele in my_list: 
        if (myString in ele): 
            count = count + 1
    return count 

myString = 'Gold Trophy'
real_count = (countX(my_list, myString))


print(*my_list, sep=', ')
print('duplicates = '+str(countX(my_list, myString)))

这时,此代码运行并返回列表中指定字符串的重复项数量。关于从何处获得所需输出的任何想法?谢谢!

2 个答案:

答案 0 :(得分:1)

这应该不使用正则表达式就可以工作。为了清楚起见,我发表了评论。

from collections import Counter
my_list = ['Gold Trophy (January)', 'Gold Trophy (February)', 'Bronze Trophy (March)']
output_ls = []
trophy_ls = []
month_ls = []
trophy_cnt_dc = {}
for item in my_list:
    trophy_ls.append(item.split(' (')[0])
    month_ls.append(item.split(' (')[1])
# print(trophy_ls) >> ['Gold Trophy', 'Gold Trophy', 'Bronze Trophy']
# print(month_ls) >> ['January)', 'February)', 'March)']
trophy_cnt_dc = dict(Counter(trophy_ls))
#print(trophy_cnt_dc) >> {'Gold Trophy': 2, 'Bronze Trophy': 1}
for k,v in trophy_cnt_dc.items():
    if v > 1:
        output_ls.append(k+' x'+str(v))
    else:
        ind = trophy_ls.index(k)
        output_ls.append(k+' ('+month_ls[ind])
print(output_ls)

输出:

['Gold Trophy x2', 'Bronze Trophy (March)']

答案 1 :(得分:0)

这是一个解决方案(请参阅注释以进行澄清)。请注意,我使用了一些技巧来拆分名称和日期:我在(上拆分,然后根据需要将其还原。可以使它更清洁,但尚不清楚是否需要。

my_list = ['Gold Trophy (January)', 'Gold Trophy (February)', 'Bronze Trophy (March)']

# Create map of tuples: (name, date)
pairs = [tuple(x.split('(')) for x in my_list]

# count the number of each name
counts = dict()
for (name, day) in pairs:
    counts[name] = counts.get(name, 0) + 1

# create a dictionary from initial list
# it doesn't matter how collisions are resolved
# the dictionary is required to process each name only once
init = dict(pairs)
res = []

# for each name:
#   if count is > 1, print the count
#   if count is 1, then print its date
for (name, date) in init.items():
    if counts[name] > 1:
        res.append(name + 'x' + str(counts[name]))
    else:
        res.append(name + '(' + date)
print(res)