Question

我遇到的情况是我需要找到最接近我请求的密钥的值。它有点像最近的地图，它定义了键之间的距离。

例如，如果我在地图中有{A，C，M，Z}键，则对D的请求将返回C的值。

有什么想法吗？

Answer 1

大多数树数据结构使用某种排序算法来存储和查找密钥。许多这样的实现可以找到你所探测的密钥的密钥（通常它最接近下方或最接近上方）。例如，Java的TreeMap实现了这样的数据结构，您可以告诉它为您提供查找键下方最近的键，或查找键上方最近的键（higherKey和lowerKey）

如果你可以计算距离（它并不总是很容易 - Java的界面只要求你知道任何给定的密钥是否在任何其他给定密钥的“下方”或“上方”），那么你可以要求最接近上方和下方最近的然后自己计算哪一个更接近。

Answer 2

您的数据的维度是什么？如果它只是一维，则排序数组将执行此操作 - 二进制搜索将找到完全匹配和/或显示搜索关键字所在的两个键之间 - 并且一个简单的测试将告诉您哪个更接近。

如果您不仅需要找到最近的键，而且需要找到一个关联的值，请维护一个相同排序的值数组 - 键数组中检索到的键的索引就是值数组中值的索引。 / p>

当然，有许多替代方法 - 使用哪种方法取决于许多其他因素，例如内存消耗，是否需要插入值，是否控制插入顺序，删除，线程问题等......

Answer 3

BK-trees正是你想要的。这是实施它们的good article。

这是一个Scala实现：

class BKTree[T](computeDistance: (T, T) => Int, node: T) {
  val subnodes = scala.collection.mutable.HashMap.empty[Int,BKTree[T]]

  def query(what: T, distance: Int): List[T] = {
    val currentDistance = computeDistance(node, what)
    val minDistance = currentDistance - distance
    val maxDistance = currentDistance + distance
    val elegibleNodes = (
      subnodes.keys.toList 
      filter (key => minDistance to maxDistance contains key) 
      map subnodes
    )
    val partialResult = elegibleNodes flatMap (_.query(what, distance))
    if (currentDistance <= distance) node :: partialResult else partialResult
  }

  def insert(what: T): Boolean = if (node == what) false else (
    subnodes.get(computeDistance(node, what)) 
    map (_.insert(what)) 
    getOrElse {
      subnodes(computeDistance(node, what)) = new BKTree(computeDistance, what)
      true
    }
  )

  override def toString = node.toString+"("+subnodes.toString+")"
}

object Test {
  def main(args: Array[String]) {
    val root = new BKTree(distance, 'A')
    root.insert('C')
    root.insert('M')
    root.insert('Z')
    println(findClosest(root, 'D'))
  }
  def charDistance(a: Char, b: Char) = a - b abs
  def findClosest[T](root: BKTree[T], what: T): List[T] = {
    var distance = 0
    var closest = root.query(what, distance)
    while(closest.isEmpty) {
      distance += 1
      closest = root.query(what, distance)
    }
    closest
  }
}

我会承认它的某些肮脏和丑陋，以及插入算法过于聪明。此外，它只适用于小距离，否则你将反复搜索树。这是一个更好的替代实现：

class BKTree[T](computeDistance: (T, T) => Int, node: T) {
  val subnodes = scala.collection.mutable.HashMap.empty[Int,BKTree[T]]

  def query(what: T, distance: Int): List[T] = {
    val currentDistance = computeDistance(node, what)
    val minDistance = currentDistance - distance
    val maxDistance = currentDistance + distance
    val elegibleNodes = (
      subnodes.keys.toList 
      filter (key => minDistance to maxDistance contains key) 
      map subnodes
    )
    val partialResult = elegibleNodes flatMap (_.query(what, distance))
    if (currentDistance <= distance) node :: partialResult else partialResult
  }

  private def find(what: T, bestDistance: Int): (Int,List[T]) = {
    val currentDistance = computeDistance(node, what)
    val presentSolution = if (currentDistance <= bestDistance) List(node) else Nil
    val best = currentDistance min bestDistance
    subnodes.keys.foldLeft((best, presentSolution))(
      (acc, key) => {
        val (currentBest, currentSolution) = acc
        val (possibleBest, possibleSolution) = 
          if (key <= currentDistance + currentBest)
            subnodes(key).find(what, currentBest)
          else
            (0, Nil)
        (possibleBest, possibleSolution) match {
          case (_, Nil) => acc
          case (better, solution) if better < currentBest => (better, solution)
          case (_, solution) => (currentBest, currentSolution ::: solution)
        }
      }
    )
  }

  def findClosest(what: T): List[T] = find(what, computeDistance(node, what))._2

  def insert(what: T): Boolean = if (node == what) false else (
    subnodes.get(computeDistance(node, what)) 
    map (_.insert(what)) 
    getOrElse {
      subnodes(computeDistance(node, what)) = new BKTree(computeDistance, what)
      true
    }
  )

  override def toString = node.toString+"("+subnodes.toString+")"
}

object Test {
  def main(args: Array[String]) {
    val root = new BKTree(distance, 'A')
    root.insert('C')
    root.insert('E')
    root.insert('M')
    root.insert('Z')
    println(root.findClosest('D'))
  }
  def charDistance(a: Char, b: Char) = a - b abs
}

Answer 4

使用C ++和STL容器（std::map），您可以使用以下模板函数：

#include <iostream>
#include <map>

//!This function returns nearest by metric specified in "operator -" of type T
//!If two items in map are equidistant from item_to_find, the earlier occured by key will be returned

template <class T,class U> typename std::map<T,U>::iterator find_nearest(std::map<T,U> map_for_search,const T& item_to_find)
{
  typename std::map<T,U>::iterator itlow,itprev;
  itlow=map_for_search.lower_bound(item_to_find);
  itprev=itlow;
  itprev--;
//for cases when we have "item_to_find" element in our map
//or "item_to_find" occures before the first element of map
  if ((itlow->first==item_to_find) || (itprev==map_for_search.begin()))
    return itlow;
//if "item"to_find" is besides the last element of map
  if (itlow==map_for_search.end())
    return itprev;

  return (itlow->first-item_to_find < item_to_find-itprev->first)?itlow:itprev; // C will be returned
//note that "operator -" is used here as a function for distance metric
}

int main ()
{
  std::map<char,int> mymap;
  std::map<char,int>::iterator nearest;
  //fill map with some information
  mymap['B']=20;
  mymap['C']=40;
  mymap['M']=60;
  mymap['Z']=80;
  char ch='D'; //C should be returned
  nearest=find_nearest<char,int>(mymap,ch);
  std::cout << nearest->first << " => " << nearest->second << '\n';
  ch='Z'; //Z should be returned
  nearest=find_nearest<char,int>(mymap,ch);
  std::cout << nearest->first << " => " << nearest->second << '\n';
  ch='A'; //B should be returned
  nearest=find_nearest<char,int>(mymap,ch);
  std::cout << nearest->first << " => " << nearest->second << '\n';
  ch='H'; // equidistant to C and M -> C is returned
  nearest=find_nearest<char,int>(mymap,ch);
  std::cout << nearest->first << " => " << nearest->second << '\n';
  return 0;
}

输出：

C => 40
Z => 80
B => 20
C => 40

假设operator -用作评估距离的函数。如果class T是您自己的类，则应该实现该运算符，其对象充当映射中的键。您也可以更改代码以使用特殊的class T静态成员函数（例如distance），而不是operator -，而不是：

return (T::distance(itlow->first,item_to_find) < T::distance(item_to_find,itprev->first))?itlow:itprev;

其中distance应该是smth。像

static distance_type some_type::distance()(const some_type& first, const some_type& second){//...}

和distance_type应支持operator <

的比较

Answer 5

你可以像树一样实现这样的东西。一种简单的方法是为树中的每个节点分配一个位串。树的每个级别都存储为一个位。所有父信息都在节点的bitstring中编码。然后，您可以轻松找到任意节点，并找到父母和子女。例如，这就是Morton ordering的工作原理。它具有额外的优势，您可以通过简单的二进制减法计算节点之间的距离。

如果数据值之间有多个链接，那么您的数据结构是图形而不是树。在这种情况下，您需要一个稍微复杂的索引系统。 Distributed hash tables做这种事。它们通常有一种计算索引空间中任意两个节点之间距离的方法。例如，Kademlia算法（由Bittorrent使用）使用应用于bitstring id的XOR距离。这允许Bittorrent客户端在链中查找ID，汇聚在未知目标位置。您可以使用类似的方法来查找最接近目标节点的节点。

Answer 6

如果您的密钥是字符串且您的相似度函数为Levenshtein distance，那么您可以使用finite-state machines：

您的地图是trie，是作为有限状态机构建的（通过联合所有键/值对并确定）。然后，使用编码Levenshtein距离的简单有限状态传感器编写输入查询，并使用您的trie进行组合。然后，使用Viterbi algorithm提取最短路径。

您只需使用finite-state toolkit进行少量函数调用即可实现所有这些功能。

Answer 7

在scala中，这是一种技术，用于查找最接近的Int＆lt; =您正在寻找的密钥

val sMap = SortedMap(1 -> "A", 2 -> "B", 3 -> "C")
sMap.to(4).lastOption.get // Returns 3
sMap.to(-1) // Returns an empty Map

是否有最近的关键地图数据结构？

7 个答案: