我有一本类似的字典:
exons = {'NM_015665': [(0, 225), (356, 441), (563, 645), (793, 861)], etc...}
和另一个位置如此的文件:
isoform pos
NM_015665 449
我想要做的是打印文件中的位置最接近的数字范围,然后在该值最接近的数字范围内打印数字。对于这种情况,我想打印(356, 441)
然后441
。我已成功找到一种方法来打印该值最接近的数字组中的数字,但下面的代码只考虑了所列数字两侧的10个值。有没有办法考虑到每组范围之间有不同数量的数字?
这是我到目前为止的代码:
with open('splicing_reinitialized.txt') as f:
reader = csv.DictReader(f,delimiter="\t")
for row in reader:
pos = row['pos']
name = row['isoform']
ppos1 = int(pos)
if name in exons:
y = exons[name]
for i, (low,high) in enumerate(exons[name]):
if low -5 <= ppos1 <= high + 5:
values = (low,high)
closest = min((low,high), key = lambda x:abs(x-ppos1))
答案 0 :(得分:1)
我会将其重写为最小距离搜索:
if name in exons:
y = exons[name]
minDist = 99999 # large number
minIdx = None
minNum = None
for i, (low,high) in enumerate(y):
dlow = abs(low - ppos1)
dhigh = abs(high - ppos1)
dist = min(dlow, dhigh)
if dist < minDist:
minDist = dist
minIdx = i
minNum = 0 if dlow < dhigh else 1
print(y[minIdx])
print(y[minIdx][minNum])
忽略搜索范围,只搜索最小距离对。
答案 1 :(得分:1)
功能替代:)。这可能会更快。它显然非常适合RAM,并且由于功能编程的特殊性,可以轻松实现并行化。我希望你会发现它足够有趣,可以学习。
from itertools import imap, izip, ifilter, repeat
def closest_point(position, interval):
""":rtype: tuple[int, int]""" # closest interval point, distance to it
position_in_interval = interval[0] <= position <= interval[1]
closest = min([(border, abs(position - border)) for border in interval], key=lambda x: x[1])
return closest if not position_in_interval else (closest[0], 0) # distance is 0 if position is inside an interval
def closest_interval(exons, pos):
""":rtype: tuple[tuple[int, int], tuple[int, int]]"""
return min(ifilter(lambda x: x[1][1], izip(exons, imap(closest_point, repeat(pos, len(exons)), exons))),
key=lambda x: x[1][1])
print(closest_interval(exons['NM_015665'], 449))
打印
((356, 441), (441, 8))
第一个元组是一个范围。第二个元组中的第一个整数是区间中的最近点,第二个整数是距离。