从元组列表第1项获取重复计数数据,其中包含患者计数器data
... data[1]
。对于下面的样本,我不需要考虑data[0]
或data[2]
import itertools
def getDuplicateinTuple(dataInput):
seen={}
return [seen.setdefault(t[0], t) for t in dataInput if t[0] not in seen]
data=[('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER1'),
('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER2'),
('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER3'),
('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER4'),
('2013 Jul 5 06:57:11:', 'PATIENT:COUNTER1'),
('2013 Jul 5 06:56:11:', 'PATIENT:COUNTER5')]
data1=[('2013 Jul 5 04:26:40:', 'PATIENT:COUNTER1', 'COUNTER INFO: : 500 '),
('2013 Jul 5 04:26:40:', 'PATIENT:COUNTER2', 'COUNTER INFO: : 500 '),
('2013 Jul 5 04:26:40:', 'PATIENT:COUNTER3', 'COUNTER INFO: : 100 '),
('2013 Jul 5 04:26:40:', 'PATIENT:COUNTER4', 'COUNTER INFO: : 100 ')]
s=getDuplicateinTuple(data)
print s
s1=getDuplicateinTuple(data1)
print s1
,预期输出为:
[('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER1'), ('2013 Jul 5 06:57:11:', 'PATIENT:COUNTER1')]
,实际输出
[('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER1'), ('2013 Jul 5 06:57:11:', 'PATIENT:COUNTER1'), ('2013 Jul 5 06:56:11:', 'PATIENT:COUNTER5')]
如果我在data1
预期产出:
[]
但当前输出:
[('2013 Jul 5 04:26:40:', 'PATIENT:COUNTER1', 'COUNTER INFO: : 500 ')]
只需通过比较列表即可实现。 有什么更好的建议方法来实现这个目标?
我在这方面看到了一些不错的堆叠帖子: Find and list duplicates in a list?
答案 0 :(得分:2)
from collections import defaultdict
def getDuplicateinTuple(dataInput):
d = defaultdict(list)
for t in dataInput:
item1 = t[1]
d[item1].append(t)
return [t for ts in d.itervalues() if len(ts) > 1 for t in ts]
data = [
('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER1'),
('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER2'),
('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER3'),
('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER4'),
('2013 Jul 5 06:57:11:', 'PATIENT:COUNTER1'),
('2013 Jul 5 06:56:11:', 'PATIENT:COUNTER5')
]
data1 = [
('2013 Jul 5 04:26:40:', 'PATIENT:COUNTER1', 'COUNTER INFO: : 500 '),
('2013 Jul 5 04:26:40:', 'PATIENT:COUNTER2', 'COUNTER INFO: : 500 '),
('2013 Jul 5 04:26:40:', 'PATIENT:COUNTER3', 'COUNTER INFO: : 100 '),
('2013 Jul 5 04:26:40:', 'PATIENT:COUNTER4', 'COUNTER INFO: : 100 ')
]
print getDuplicateinTuple(data)
# => [('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER1'),
# ('2013 Jul 5 06:57:11:', 'PATIENT:COUNTER1')]
print getDuplicateinTuple(data1)
# => []
答案 1 :(得分:0)
您可以创建一个(默认)字典来计算出现次数,然后过滤掉少于一次的出现次数:
from collections import defaultdict
d = defaultdict(list)
for timestamp, counter in data:
d[counter].append(timestamp)
for counter, timestamps in d.items():
if len(timestamps) > 1:
print([(t, counter) for t in timestamps])