Question

尝试在列表中查找重复字符串~100,000并计算每个字符串的数量和它们所在的索引并打印它们。到目前为止，我想出了这个：

 list_b = ['04/Sep/2016:00:00:03 -0400', '04/Sep/2016:00:00:04 -0400', '04/Sep/2016:00:00:05 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:11 -0400', '04/Sep/2016:00:00:15 -0400', '04/Sep/2016:00:00:19 -0400', '04/Sep/2016:00:00:20 -0400', '04/Sep/2016:00:00:23 -0400', '04/Sep/2016:00:00:25 -0400', '04/Sep/2016:00:00:26 -0400']

for i in list_b:
    if(i in list_b):
        print(i + " Amount of duplicates: " + amount of duplicates + " Index of duplicates: " + index of duplicate)

输出应该是这样的：

"04/Sep/2016:00:00:06 -0400  Amount of duplicates:  2 Index of duplicates: 3,4"
"04/Sep/2016:00:00:08 -0400  Amount of duplicates:  3 Index of duplicates: 5,6,7"

Answer 1

from collections import defaultdict

list_b = ['04/Sep/2016:00:00:03 -0400', '04/Sep/2016:00:00:04 -0400', '04/Sep/2016:00:00:05 -0400',
          '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:08 -0400',
          '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:11 -0400',
          '04/Sep/2016:00:00:15 -0400', '04/Sep/2016:00:00:19 -0400', '04/Sep/2016:00:00:20 -0400',
          '04/Sep/2016:00:00:23 -0400', '04/Sep/2016:00:00:25 -0400', '04/Sep/2016:00:00:26 -0400']

indices_dict = defaultdict(list)

for index, value in enumerate(list_b):
    indices_dict[value].append(index)

for value, index_list in indices_dict.items():
    num_duplicates = len(index_list)
    if num_duplicates > 1:
        print("%s Amount of duplicates: %s, Indices of duplicates: %s" %
              (value, num_duplicates, index_list))

Answer 2

ela_articles /= ela_active_students.to_f
ela_days /= ela_active_students.to_f
ela_growth /= ela_active_students.to_f
ela_at_above_now /= ela_active_students.to_f
ela_at_above_before /= ela_active_students.to_f

Answer 3

这应该做到

mylist =  ["a", "a", "b", "c", "b"]

for index, item in enumerate(mylist):
    rep_time = mylist.count(item)
    print(item, " Amount of duplicates: ", rep_time, "| Index of duplicates: ", index)

在python 3上测试，它工作正常

Answer 4

list_b = ['04/Sep/2016:00:00:03 -0400', '04/Sep/2016:00:00:04 -0400', '04/Sep/2016:00:00:05 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:11 -0400', '04/Sep/2016:00:00:15 -0400', '04/Sep/2016:00:00:19 -0400', '04/Sep/2016:00:00:20 -0400', '04/Sep/2016:00:00:23 -0400', '04/Sep/2016:00:00:25 -0400', '04/Sep/2016:00:00:26 -0400']
results={}
for i in range(len(list_b)):
    if list_b[i] not in results:
        results[list_b[i]]={'string':list_b[i],'count':list_b.count(list_b[i]),'index':[i]}
    else:
        results[list_b[i]]['index'].append(i)
for result in results:
    if len(results[result]['index'])>1:
        print results[result]['string'],'Amount of duplicates:',results[result]['count'],'Index of Duplicates:',",".join(map(str,results[result]['index']))

输出

04/Sep/2016:00:00:06 -0400 Amount of duplicates: 2 Index of Duplicates: 3,4
04/Sep/2016:00:00:08 -0400 Amount of duplicates: 3 Index of Duplicates: 5,6,7

从大列表中打印每组重复字符串及其索引的编号

4 个答案: