编辑：

Question

我正在尝试构造此函数，但无法解决如何停止重复计数同一重复项的功能。有人可以帮我吗？

Parsed 'Hello, meet me 5/1/2019 at 6:32 PM and then 5/2/2019 7:32 PM bye.' to 5/1/2019 at 6:32 PM.
Unable to convert 'Hello, meet me 5/1/2019 at 6:32 PM and then 5/2/2019 7:32 PM bye.' to a date.
Parsed 'Hello, meet me 5/1/2019 at 6:32 PM and then 5/2/2019 7:32 PM bye.' to 5/2/2019 7:32 PM.
Converted 'Hello, meet me 5/1/2019 at 6:32 PM and then 5/2/2019 7:32 PM bye.' to 05/02/2019 19:32:00.

在：def count_duplicates(seq): '''takes as argument a sequence and returns the number of duplicate elements''' fir = 0 sec = 1 count = 0 while fir < len(seq): while sec < len(seq): if seq[fir] == seq[sec]: count = count + 1 sec = sec + 1 fir = fir + 1 sec = fir + 1 return count

退出：count_duplicates([-1,2,4,2,0,4,4])

此处失败，因为输出应为4。

Answer 1

您可以仅从列表中创建一个set，该列表会自动删除重复项，然后计算所创建的集合与原始列表的长度之差。像这样：

def count_duplicates(seq): 

    '''takes as argument a sequence and
    returns the number of duplicate elements'''

    return len(seq) - len(set(seq))

res = count_duplicates([-1,2,4,2,0,4,4])
print(res)  # -> 3

如果不允许或不想使用任何内置的快捷方式（无论出于何种原因），您可以花很长的时间：

def count_duplicates2(seq): 

    '''takes as argument a sequence and
    returns the number of duplicate elements'''

    counter = 0
    seen = set()
    for elm in seq:
        if elm in seen:
            counter += 1
        else:
            seen.add(elm)
    return counter

res = count_duplicates2([-1,2,4,2,0,4,4])
print(res)  # -> 3

最后，就您的代码而言，@ AlanB在here中很好地概述了有关代码的问题。我选择不打扰更正您的代码，因为在我看来这是his answer。显然，您具有某种编程背景，但是复杂的while循环只是不是在Python中完成工作的方式。

Answer 2

Ev的解决方案。 Kounis是最简单的，在我的拙见中您应该使用。但是，如果您想坚持自己的代码，这就是为什么它不起作用的原因：

使用复杂的while循环，您基本上会说：“对于列表中的每个项目，当找到重复项时，请增加count”，这基本上就是您想要的。但是，由于您有两个“ 4个重复项”，因此count会增加额外的时间。

seq=[-1,2,4,2,0,4,4]
fir = 0
sec = 0
count = 0
print "Pairs of duplicates: "
for fir, item1 in enumerate(seq):
    for sec, item2 in enumerate(seq):
        if fir < sec and seq[fir] == seq[sec] :
            count+=1
            print(fir, sec)

print "Number of duplicates: ", count

哪个输出：

Pairs of duplicates: 
(1, 3)
(2, 5)
(2, 6)
(5, 6)
Number of duplicates:  4

(5,6)对不正确。

要解决此问题，只需在您的if语句中添加一个条件，以防止一项被比较两次：

seq=[-1,2,4,2,0,4,4]
fir = 0
sec = 0
count = 0
duplicates=[]
print "Pairs of duplicates: "
for fir, item1 in enumerate(seq):
    for sec, item2 in enumerate(seq):
        if fir < sec and seq[fir] == seq[sec] and seq[fir] not in duplicates:
            count+=1
            print(fir, sec)

    duplicates.append(seq[fir])

print "Number of duplicates: ", count

哪个输出所需的结果：

Pairs of duplicates: 
(1, 3)
(2, 5)
(2, 6)
Number of duplicates:  3

但是，再次做

len(seq)-len(set(seq))

要简单得多，而且效果也很好。

编辑：

我意识到在示例中我没有使用while循环。

def count_duplicates(seq): 

    fir = 0
    sec = 0
    count = 0
    duplicates=[]
    print "Pairs of duplicates: "
    while fir < len(seq):
        while sec < len(seq):
            if fir < sec and seq[fir] == seq[sec] and seq[fir] not in duplicates:
                count += 1
                print(fir, sec)
            sec += 1
        duplicates.append(seq[fir])
        fir += 1
        sec = 0
    return count 


c=count_duplicates([-1,2,4,2,0,4,4])
print "Number of duplicates: ", c

Answer 3

使用熊猫的方法。此方法适用于具有重复项的大型列表。

data = [-1,2,4,2,0,4,4]
import pandas as pd
df = pd.DataFrame({'data':data}) #Loading the data as Data Frame
print(df[df1==False]) #Printing Non-Duplicated Values
   data
0    -1
1     2
2     4
4     0
print(df[df1==False].count()) #Taking count of Non-Duplicate Values
data    4
dtype: int64

计算列表中重复项的数量

3 个答案:

编辑：