Question

我正试图找到一种方法来找到dictonary中最接近字符串的键。例如：

data = {'1a': 'This is 1a', '1d': 'This is 1d', '1f': 'This is 1f', '1e': 'This is 1e'}
find_nearest(data, '1b')
#This would return key '1a'

我找到了其他例子，但大多数处理数字。例如：

data[num] if num in data else data[min(data.keys(), key=lambda k: abs(k-num))]

我能够找到一个看起来很有希望的代码：

from sortedcontainers import SortedDict
sd = SortedDict((key, value) for key, value in data)

# Bisect for the index of the desired key.
index = sd.bisect(200)

# With that index, lookup the key.
key = sd.iloc[index]

# You can also look ahead or behind to find the nearest key.
behind = sd.iloc[index - 1]
ahead = sd.iloc[index + 1]

所以我试过这个，这是我的代码：

from sortedcontainers import SortedDict
data = {'1a': 'This is 1a', '1d': 'This is 1d', '1f': 'This is 1f', '1e': 'This is 1e'}
sd = SortedDict((key,value) for key,value in data.items())

index = sd.bisect('1b')

key = sd.iloc[index]
print(key)

但是当我运行这段代码时，它会返回：

1d #Instead of '1a'

我已经尝试了各种方法来使代码工作，但我似乎无法做到正确。有没有人知道实现这一目标的快速有效方法？

Answer 1

当您平分时，如果算法没有找到精确的索引匹配，则该算法有2个选项。它可以返回左侧对象的索引，也可以返回右侧对象的索引。看起来bisect是bisect_right的别名。您可以使用bisect_left代替......

当然，这并不一定更接近（你还没有真正定义你的意思更接近）。实际上，即使像difflib.SequenceMatcher.ratio()这样的东西也可能对这个例子没有帮助，因为它只是看看匹配与非匹配元素的比例是多少。

您可以尝试以下方式：

def find_closest(sd, expected):
    index = sd.bisect(expected)
    closest_lower = sd.iloc[index]
    try: 
        closest_upper = sd.iloc[index]
    except IndexError:
        return closest_lower

    # assumption -- Your keys are hex values.
    # this assumption could be completely wrong, but demonstrates
    # how to think of defining a measure of "closeness"
    var expected_as_int = int(expected, 16)
    def distance(val):
        return int(val, 16) - expected_as_int

    return min([closest_lower, closest_upper], key=distance)

Answer 2

我实现这一点的方法是按顺序迭代键，找到具有最小“差异”的键。因为按键已经排序，所以一旦差异停止减少，就会发现它已经找到了。

def closestKey(data, val):
    lastKey = None
    lastDif = None
    for key in sorted(data.keys()):
        dif = difference(key, val) #need to figure out difference()
        if lastDif is not None and dif > lastDif:
            return lastKey
        lastDif = dif
        lastKey = key

这不处理两个键等距的情况，如果这很重要的话。

Answer 3

感谢@mgilson，这给了我帮助我的想法，我能够做到我想要实现的目标。以下是我感兴趣的人的代码：

from sortedcontainers import SortedDict
data = {'1a': 'This is 1a', '1d': 'This is 1d', '1g': 'This is 1g', '1h': 'This is 1h'}
def find_closest(sd, expected):
    index = sd.bisect(expected)
    try:
        indexAhead = sd.iloc[index]
    except IndexError:
        indexAhead = sd.iloc[len(sd.keys()) - 1]
    if indexAhead == expected:
        return expected
    else:

        try:
            indexBehindNum = 0
            indexBehind = sd.iloc[index -1]
            for char in indexBehind:
                indexBehindNum += ord(char)
        except IndexError:
            pass
        if not indexBehindNum:
            return indexAhead
        else:
            expectedTotalNum = 0
            indexAheadNum = 0
            for char in expected:
                expectedTotalNum += ord(char)
            for char in indexAhead:
                indexAheadNum += ord(char)
            diffrenceAhead = indexAheadNum - expectedTotalNum
            diffrenceBehind = indexBehindNum - expectedTotalNum
            Closest = min([diffrenceAhead, diffrenceBehind], key=abs)
            if Closest == diffrenceAhead:
                return indexAhead
            else:
                return indexBehind

sd = SortedDict((key,value) for key,value in data.items())

print(find_closest(sd, '1b'))#This will return '1a'!

我不确定这是否是最快和最有效的，但我会继续尝试寻找其他方法。

使用字符串查找dictonary中最近的键？

3 个答案: