我正在尝试构造此函数,但无法解决如何停止重复计数同一重复项的功能。有人可以帮我吗?
Parsed 'Hello, meet me 5/1/2019 at 6:32 PM and then 5/2/2019 7:32 PM bye.' to 5/1/2019 at 6:32 PM.
Unable to convert 'Hello, meet me 5/1/2019 at 6:32 PM and then 5/2/2019 7:32 PM bye.' to a date.
Parsed 'Hello, meet me 5/1/2019 at 6:32 PM and then 5/2/2019 7:32 PM bye.' to 5/2/2019 7:32 PM.
Converted 'Hello, meet me 5/1/2019 at 6:32 PM and then 5/2/2019 7:32 PM bye.' to 05/02/2019 19:32:00.
在:def count_duplicates(seq):
'''takes as argument a sequence and
returns the number of duplicate elements'''
fir = 0
sec = 1
count = 0
while fir < len(seq):
while sec < len(seq):
if seq[fir] == seq[sec]:
count = count + 1
sec = sec + 1
fir = fir + 1
sec = fir + 1
return count
退出:count_duplicates([-1,2,4,2,0,4,4])
此处失败,因为输出应为4
。
答案 0 :(得分:3)
您可以仅从列表中创建一个set
,该列表会自动删除重复项,然后计算所创建的集合与原始列表的长度之差。
像这样:
def count_duplicates(seq):
'''takes as argument a sequence and
returns the number of duplicate elements'''
return len(seq) - len(set(seq))
res = count_duplicates([-1,2,4,2,0,4,4])
print(res) # -> 3
如果不允许或不想使用任何内置的快捷方式(无论出于何种原因),您可以花很长的时间:
def count_duplicates2(seq):
'''takes as argument a sequence and
returns the number of duplicate elements'''
counter = 0
seen = set()
for elm in seq:
if elm in seen:
counter += 1
else:
seen.add(elm)
return counter
res = count_duplicates2([-1,2,4,2,0,4,4])
print(res) # -> 3
最后,就您的代码而言,@ AlanB在here中很好地概述了有关代码的问题。我选择不打扰更正您的代码,因为在我看来这是his answer。显然,您具有某种编程背景,但是复杂的while
循环只是不是在Python中完成工作的方式。
答案 1 :(得分:1)
Ev的解决方案。 Kounis是最简单的,在我的拙见中您应该使用。但是,如果您想坚持自己的代码,这就是为什么它不起作用的原因:
使用复杂的while
循环,您基本上会说:“对于列表中的每个项目,当找到重复项时,请增加count
”,这基本上就是您想要的。但是,由于您有两个“ 4个重复项”,因此count
会增加额外的时间。
seq=[-1,2,4,2,0,4,4]
fir = 0
sec = 0
count = 0
print "Pairs of duplicates: "
for fir, item1 in enumerate(seq):
for sec, item2 in enumerate(seq):
if fir < sec and seq[fir] == seq[sec] :
count+=1
print(fir, sec)
print "Number of duplicates: ", count
哪个输出:
Pairs of duplicates:
(1, 3)
(2, 5)
(2, 6)
(5, 6)
Number of duplicates: 4
(5,6)
对不正确。
要解决此问题,只需在您的if
语句中添加一个条件,以防止一项被比较两次:
seq=[-1,2,4,2,0,4,4]
fir = 0
sec = 0
count = 0
duplicates=[]
print "Pairs of duplicates: "
for fir, item1 in enumerate(seq):
for sec, item2 in enumerate(seq):
if fir < sec and seq[fir] == seq[sec] and seq[fir] not in duplicates:
count+=1
print(fir, sec)
duplicates.append(seq[fir])
print "Number of duplicates: ", count
哪个输出所需的结果:
Pairs of duplicates:
(1, 3)
(2, 5)
(2, 6)
Number of duplicates: 3
但是,再次做
len(seq)-len(set(seq))
要简单得多,而且效果也很好。
我意识到在示例中我没有使用while循环。
def count_duplicates(seq):
fir = 0
sec = 0
count = 0
duplicates=[]
print "Pairs of duplicates: "
while fir < len(seq):
while sec < len(seq):
if fir < sec and seq[fir] == seq[sec] and seq[fir] not in duplicates:
count += 1
print(fir, sec)
sec += 1
duplicates.append(seq[fir])
fir += 1
sec = 0
return count
c=count_duplicates([-1,2,4,2,0,4,4])
print "Number of duplicates: ", c
答案 2 :(得分:0)
使用熊猫的方法。此方法适用于具有重复项的大型列表。
data = [-1,2,4,2,0,4,4]
import pandas as pd
df = pd.DataFrame({'data':data}) #Loading the data as Data Frame
print(df[df1==False]) #Printing Non-Duplicated Values
data
0 -1
1 2
2 4
4 0
print(df[df1==False].count()) #Taking count of Non-Duplicate Values
data 4
dtype: int64