Question

我正在尝试找到 C 哪个类别 x 属于。我的类别被定义为像这样

的文件中的字符串名称和双精度值

A 1.0
B 2.5
C 7.0

应该像这样解释

"A": 0 < x <= 1.0
"B": a < x <= 2.5
"C": b < x <= 7.0

（输入可以具有任意长度，可能必须按其值排序）。我只需要一个像这样的函数

std::string findCategory(categories_t categories, double x) {
    ...insert magic here
}

所以对于这个例子，我期待

findCategory(categories, 0.5) == "A"
findCategory(categories, 1.9) == "B"
findCategory(categories, 6.0) == "C"

所以我的问题是a）如何编写函数和b）category_t的最佳选择是什么（在前11 C ++中使用stl）。我做了几次尝试，所有这些都是......不太成功。

Answer 1

一种选择是使用带有双精度的std::map容器作为键和值，该值对应于分配给上端点为给定值的范围的值。例如，根据您的文件，您将拥有如下地图：

std::map<double, std::string> lookup;
lookup[1.0] = "A";
lookup[2.5] = "B";
lookup[7.0] = "C";

然后，你可以使用std::map::lower_bound函数，给定一点，以获取键/值对，其键（上端点）是地图中至少与点一样大的第一个键有问题。例如，使用上面的映射，lookup.lower_bound(1.37)将返回一个值为“B”的迭代器。 lookup.lower_bound(2.56)将返回一个值为“C”的迭代器。这些查找速度很快;他们需要O（log n）时间来获得具有n个元素的地图。

在上面，我假设你正在查找的值都是非负的。如果允许负值，则可以在执行任何查找之前添加快速测试以检查值是否为负数。这样，您就可以消除虚假结果。

对于它的价值，如果你碰巧知道查找的分布（比如，它们是均匀分布的），就可以构建一个名为 optimal binary search tree 这将提供比std::map更好的访问时间。此外，根据您的应用程序，可能会有更快的选项。例如，如果您这样做是因为您想要随机选择具有不同概率的结果之一，那么我建议您查看 this article on the alias method ，它可以让您在O中生成随机值（1）时间。

希望这有帮助！

Answer 2

您可以使用＆lt;对中的对类型和'lower_bound'算法＆gt; http://www.cplusplus.com/reference/algorithm/lower_bound/

让我们根据上边缘定义您的类别： typedef pair categories_t;

然后只需制作这些边的矢量并使用二分搜索进行搜索。请参阅下面的完整示例。

#include <string>
#include <vector>
#include <algorithm>
#include <iostream>

using namespace std;
typedef pair<double,string> category_t;

std::string findCategory(const vector<category_t> &categories, double x) {
   vector<category_t>::const_iterator it=std::lower_bound(categories.begin(), categories.end(),category_t(x,""));
   if(it==categories.end()){
      return "";
   }
   return it->second;
}

int main (){

   vector< category_t > edges;
   edges.push_back(category_t(0,"bin n with upper edge at 0 (underflow)"));
   edges.push_back(category_t(1,"bin A with upper edge at 1"));
   edges.push_back(category_t(2.5,"bin B with upper edge at 2.5"));
   edges.push_back(category_t(7,"bin C with upper edge at 7"));
   edges.push_back(category_t(8,"bin D with upper edge at 8"));
   edges.push_back(category_t(9,"bin E with upper edge at 9"));
   edges.push_back(category_t(10,"bin F with upper edge at 10"));

   vector< double > examples ;
   examples.push_back(1);
   examples.push_back(3.3);
   examples.push_back(7.4);
   examples.push_back(-5);
   examples.push_back(15);

   for( vector< double >::const_iterator eit =examples.begin();eit!=examples.end();++eit)
      cout << "value "<< *eit << " : " << findCategory(edges,*eit) << endl;   
}

比较按照我们想要的方式工作，因为double是对中的第一个，并且首先通过比较第一个和第二个成分来比较对。否则，我们将定义一个比较谓词，如上面链接的页面所述。

查找值落入哪个bin

2 个答案: