我正试图找到一种方法来找到dictonary中最接近字符串的键。例如:
data = {'1a': 'This is 1a', '1d': 'This is 1d', '1f': 'This is 1f', '1e': 'This is 1e'}
find_nearest(data, '1b')
#This would return key '1a'
我找到了其他例子,但大多数处理数字。例如:
data[num] if num in data else data[min(data.keys(), key=lambda k: abs(k-num))]
我能够找到一个看起来很有希望的代码:
from sortedcontainers import SortedDict
sd = SortedDict((key, value) for key, value in data)
# Bisect for the index of the desired key.
index = sd.bisect(200)
# With that index, lookup the key.
key = sd.iloc[index]
# You can also look ahead or behind to find the nearest key.
behind = sd.iloc[index - 1]
ahead = sd.iloc[index + 1]
所以我试过这个,这是我的代码:
from sortedcontainers import SortedDict
data = {'1a': 'This is 1a', '1d': 'This is 1d', '1f': 'This is 1f', '1e': 'This is 1e'}
sd = SortedDict((key,value) for key,value in data.items())
index = sd.bisect('1b')
key = sd.iloc[index]
print(key)
但是当我运行这段代码时,它会返回:
1d #Instead of '1a'
我已经尝试了各种方法来使代码工作,但我似乎无法做到正确。有没有人知道实现这一目标的快速有效方法?
答案 0 :(得分:4)
当您平分时,如果算法没有找到精确的索引匹配,则该算法有2个选项。它可以返回左侧对象的索引,也可以返回右侧对象的索引。看起来bisect
是bisect_right
的别名。您可以使用bisect_left
代替......
当然,这并不一定更接近(你还没有真正定义你的意思更接近)。实际上,即使像difflib.SequenceMatcher.ratio()
这样的东西也可能对这个例子没有帮助,因为它只是看看匹配与非匹配元素的比例是多少。
您可以尝试以下方式:
def find_closest(sd, expected):
index = sd.bisect(expected)
closest_lower = sd.iloc[index]
try:
closest_upper = sd.iloc[index]
except IndexError:
return closest_lower
# assumption -- Your keys are hex values.
# this assumption could be completely wrong, but demonstrates
# how to think of defining a measure of "closeness"
var expected_as_int = int(expected, 16)
def distance(val):
return int(val, 16) - expected_as_int
return min([closest_lower, closest_upper], key=distance)
答案 1 :(得分:2)
我实现这一点的方法是按顺序迭代键,找到具有最小“差异”的键。因为按键已经排序,所以一旦差异停止减少,就会发现它已经找到了。
def closestKey(data, val):
lastKey = None
lastDif = None
for key in sorted(data.keys()):
dif = difference(key, val) #need to figure out difference()
if lastDif is not None and dif > lastDif:
return lastKey
lastDif = dif
lastKey = key
这不处理两个键等距的情况,如果这很重要的话。
答案 2 :(得分:0)
感谢@mgilson,这给了我帮助我的想法,我能够做到我想要实现的目标。以下是我感兴趣的人的代码:
from sortedcontainers import SortedDict
data = {'1a': 'This is 1a', '1d': 'This is 1d', '1g': 'This is 1g', '1h': 'This is 1h'}
def find_closest(sd, expected):
index = sd.bisect(expected)
try:
indexAhead = sd.iloc[index]
except IndexError:
indexAhead = sd.iloc[len(sd.keys()) - 1]
if indexAhead == expected:
return expected
else:
try:
indexBehindNum = 0
indexBehind = sd.iloc[index -1]
for char in indexBehind:
indexBehindNum += ord(char)
except IndexError:
pass
if not indexBehindNum:
return indexAhead
else:
expectedTotalNum = 0
indexAheadNum = 0
for char in expected:
expectedTotalNum += ord(char)
for char in indexAhead:
indexAheadNum += ord(char)
diffrenceAhead = indexAheadNum - expectedTotalNum
diffrenceBehind = indexBehindNum - expectedTotalNum
Closest = min([diffrenceAhead, diffrenceBehind], key=abs)
if Closest == diffrenceAhead:
return indexAhead
else:
return indexBehind
sd = SortedDict((key,value) for key,value in data.items())
print(find_closest(sd, '1b'))#This will return '1a'!
我不确定这是否是最快和最有效的,但我会继续尝试寻找其他方法。