我有一个波形对象,定义如下:
class wfm:
"""Class defining a waveform characterized by:
- A name
- An electrode configuration
- An amplitude (mA)
- A pulse width (microseconds)"""
def __init__(self, name, config, amp, width=300):
self.name = name
self.config = config
self.amp = amp
self.width = width
def __eq__(self, other):
return type(other) is self.__class__ and other.name == self.name and other.config == self.config and other.amp == self.amp and other.width == self.width
def __ne__(self, other):
return not self.__eq__(other)
通过解析,我得到一个名为波形的列表,其中包含770个wfm实例。有很多重复,我需要删除它们。
我的想法是获取等效对象的ID,将最大的ID存储在列表中,然后在弹出每个副本时从最后循环所有波形。
代码:
duplicate_ID = []
for i in range(len(waveforms)):
for j in range(i+1, len(waveforms)):
if waveforms[i] == waveforms[j]:
duplicate_ID.append(waveforms.index(waveforms[j]))
print ('{} eq {}'.format(i, j))
duplicate_ID = list(set(duplicate_ID)) # If I don't do that; 17k IDs
原来(对于印刷品)我有没有出现在ID列表中的副本,例如750是763的副本(打印说它;测试也是)但是这两个ID中没有一个出现在我的重复清单。
我很确定这个方法(它还没有工作)有更好的解决方案,我很乐意听到它。谢谢你的帮助!
编辑:更复杂的情况
我有一个更复杂的场景。我得到了2个课程,wfm(见上文)和刺激:
class stim:
"""Class defining the waveform used for a stimultion by:
- Duration (milliseconds)
- Frequence Hz)
- Pattern of waveforms"""
def __init__(self, dur, f, pattern):
self.duration = dur
self.fq = f
self.pattern = pattern
def __eq__(self, other):
return type(other) is self.__class__ and other.duration == self.duration and other.fq == self.fq and other.pattern == self.pattern
def __ne__(self, other):
return not self.__eq__(other)
我解析我的文件以填写dict:范例。它看起来像是:
paradigm[file name STR] = (list of waveforms, list of stimulations)
# example:
paradigm('myfile.xml') = ([wfm1, ..., wfm10], [stim1, ..., stim5])
再次,我想删除重复项,即我只想保留数据:
示例:
file1 has 10 waveforms and file2 has the same 10 waveforms.
file1 has stim1 and stim2 ; file2 has stim3, sitm 4 and stim 5.
stim1 and stim3 are the same; so since the waveforms are also the same, I want to keep:
file1: 10 waveforms and stim1 and stim2
file2: 10 waveforms and stim 4 and stim5
这种相关性在我脑海中有点混乱,所以我遇到了一些困难,为波形和刺激寻找合适的存储解决方案,以便轻松地进行比较。如果您有任何想法,我会很高兴听到它。谢谢!
答案 0 :(得分:1)
.index
方法使用您重载的.__eq__
方法。所以
waveforms.index(waveforms[j])
将始终在列表中找到波形的第一个实例,其中包含与waveforms[j]
相同的属性。
w1 = wfm('a', {'test_param': 4}, 3, 2.0)
w2 = wfm('b', {'test_param': 4}, 3, 2.0)
w3 = wfm('a', {'test_param': 4}, 3, 2.0)
w1 == w3 # True
w2 == w3 # False
waveforms = [w1, w2, w3]
waveforms.index(waveforms[2]) == waveforms.index(waveforms[0]) == 0 # True
如果您不可改变地执行此操作,则无需存储列表索引:
key = lambda w: hash(str(vars(w)))
dupes = set()
unique = [dupes.add(key(w)) or w for w in waveforms if key(w) not in dupes]
unique == [w1, w2] # True
key = lambda w: hash(str(vars(w)))
seen = set()
idxs = [i if key(w) in seen else seen.add(key(w)) for i, w in enumerate(waveforms)]
for idx in filter(None, idxs[::-1]):
waveforms.pop(idx)
waveforms == [w1, w2] # True
在编写算法时考虑大O复杂度是一个好习惯(尽管优化应该仅在需要时以可读性为代价)。在这种情况下,这些解决方案更具可读性,也是最优化的。
由于双循环,您的初始解是O(n ^ 2)。
提供的两种解决方案都是O(n)。