这是我运行的示例代码,用于从已排序和逗号分隔的列表中删除重复项。 但它并没有删除一些重复.......
import sys
beginning=1;
prev=0;
f=open(sys.argv[1]);
for line in f:
lst=line.split(",")
for num in lst:
if(beginning==1):
sys.stdout.write("if case ")
sys.stdout.write(num)
beginning=0
prev=num
else:
if(num==prev):
continue;
else:
sys.stdout.write("else case ")
sys.stdout.write(",")
sys.stdout.write(num)
prev=num
beginning=1
已多次尝试弄清楚我们的错误,在java中运行良好。
答案 0 :(得分:1)
当您可以使用set()
示例:
>>> my_list = [1,4,2,3,4,4,3,1,1,5,6,4,3,2]
>>> set(my_list)
set([1, 2, 3, 4, 5, 6])
>>>
set()
将从您的列表中删除所有重复的项目,并为您留下每个项目之一
了解更多here
答案 1 :(得分:0)
给出一个文件k.txt
k.txt
1, 2, 3, 4, 5, 6, 4, 2, 3, 2, 1, 4, 6, 7, 4, 3, 4, 8, 9, 0, 0, 0
您可以执行以下操作:
import numpy as np
# split it in to a list of values and get rid of the newline
a = open('k.txt','r').read().replace('\n','').split(',')
np.unique(a) # returns unique values and sorts it for you :)
为什么这比设定更好? 好:
给出大集:
a = np.random.randint(0,100,size=(100000))
>>> b = time(); set(a); print time()-b
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
0.0197851657867
>>> b = time(); np.unique(a); print time()-b
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
0.00981211662292
- >更快的运行时间:D