Question

假设我们有一个数据结构，它是一个键值映射，其中键本身又是一个键值映射。例如：

map<map<string,string>>, string>

现在，假设我们要查询此映射中与键的键值的某些子集匹配的所有顶级键/值。示例：

map = { { "k1" : "v1", "k2 : "v2" } : "value1",
  { "k1" : "v3", "k2 : "v4" } : "value2",
  { "k1" : "v1", "k2 : "v5" } : "value3"
}

我们的查询是“给我键包含{ "k1" : "v1" }的所有键值，它将返回第一个和第三个值。类似地，查询{ "k1" : "v3", "k2" : "v4" }将返回所有具有两个键值的键值k1=v3和k2=v4，产生第二个值。很显然，我们可以在每个查询中搜索完整的地图，但是我正在寻找比这更有效的东西。

我环顾四周，但找不到适用于C ++的高效，易于使用的解决方案。在查询键值对子集时，Boost multi_index似乎没有这种灵活性。

某些数据库具有创建索引的方式，这些索引可以准确地回答这类查询。例如，Postgres具有GIN索引（广义倒排索引），可让您提出

SELECT * FROM table WHERE some_json_column @> '{"k1":"v1","k2":"v2"}'
-- returns all rows that have both k1=v1 and k2=v2

但是，我正在寻找一个不使用C ++的数据库的解决方案。有没有可以完成类似任务的库或数据结构？如果没有，则在自定义实现上有一些指针吗？

Answer 1

您可以使用std::includes检查键映射是否包括另一个查询的键值对映射。我不确定如何避免检查每个键映射。也许其他答案有更好的主意。

template <typename MapOfMapsIt, typename QueryMapIt>
std::vector<MapOfMapsIt> query_keymap_contains(
    MapOfMapsIt mom_fst,
    MapOfMapsIt mom_lst,
    QueryMapIt q_fst,
    QueryMapIt q_lst)
{
    std::vector<MapOfMapsIt> out;
    for(; mom_fst != mom_lst; ++mom_fst)
    {
        const auto key_map = mom_fst->first;
        if(std::includes(key_map.begin(), key_map.end(), q_fst, q_lst))
            out.push_back(mom_fst);
    }
    return out;
}

用法：

typedef std::map<std::string, std::string> StrMap;
typedef std::map<StrMap, std::string> MapKeyMaps;
MapKeyMaps m = {{{{"k1", "v1"}, {"k2", "v2"}}, "value1"},
                {{{"k1", "v3"}, {"k2", "v4"}}, "value2"},
                {{{"k1", "v1"}, {"k2", "v5"}}, "value3"}};
StrMap q1 = {{"k1", "v1"}};
StrMap q2 = {{"k1", "v3"}, {"k2", "v4"}};
auto res1 = query_keymap_contains(m.begin(), m.end(), q1.begin(), q1.end());
auto res2 = query_keymap_contains(m.begin(), m.end(), q2.begin(), q2.end());
std::cout << "Query1:    ";
for(auto i : res1) std::cout << i->second << " ";
std::cout << "\nQuery2:    ";
for(auto i : res2) std::cout << i->second << " ";

输出：

Query1:    value1 value3 
Query2:    value2

Live Example

Answer 2

我会保留数据库索引的类比。以此类推，索引搜索不使用通用的k = v类型搜索，而只是使用包含构成索引的元素（通常为列）的值的元组。然后，数据库将恢复扫描以查找索引中未包含的其他k = v个参数。

以此类推，您将有固定数量的键，这些键可以表示为数组或字符串（固定大小）。好消息是，然后在键上设置全局顺序很简单，而且由于有了for (int i = sentence.length()-1; i >= 0; i--) { reverse += sentence[i]; }方法，在部分键之后立即找到迭代器也很简单。

因此立即获得完整密钥：只需使用std::map::upper_bound，find或at提取它即可。并且获取部分键的所有元素仍然很简单：

在operator []的部分键上方找到一个迭代器
在元素与部分键匹配时向前迭代

但这要求您将初始类型更改为upper_bound

您可以使用std::map<std::array<string, N>, string>作为输入值在此容器上构建API，从中提取实际的完整键或部分键，然后如上所述进行迭代，仅保留与索引中不存在的k，v对匹配的元素。

Answer 3

您可以通过对每个元素进行单次（部分）传递并使用有序查询来做到这一点，并尽可能早地返回。借鉴std::set_difference的启发，我们想知道query是否是data的子集，这使我们可以选择外部地图的条目。

// Is the sorted range [first1, last1) a subset of the sorted range [first2, last2)
template<class InputIt1, class InputIt2>
bool is_subset(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2)
{
    while (first1 != last1) {
        if (first2 == last2) return false; // Reached the end of data with query still remaing

        if (*first1 < *first2) {
            return false; // didn't find this query element
        } else {
            if (! (*first2 < *first1)) {
                ++first1; // found this query element
            }
            ++first2;
        }
    }
    return true; // reached the end of query
}

// find every element of "map-of-maps" [first2, last2) for which the sorted range [first1, last1) is a subset of it's key
template<class InputIt1, class InputIt2, class OutputIt>
OutputIt query_data(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2, OutputIt d_first)
{
    auto item_matches = [=](auto & inner){ return is_subset(first1, last1, inner.first.begin(), inner.first.end()); };
    return std::copy_if(first2, last2, d_first, item_matches);
}

Answer 4

我相信不同方法的效率将取决于实际数据。但是，我会考虑对特定"kX","vY"对的外部映射元素进行迭代器的“缓存”，如下所示：

using M = std::map<std::map<std::string, std::string>, std::string>;
M m = {
   { { { "k1", "v1" }, { "k2", "v2" } }, "value1" },
   { { { "k1", "v3" }, { "k2", "v4" } }, "value2" },
   { { { "k1", "v1" }, { "k2", "v5" } }, "value3" }
};

std::map<M::key_type::value_type, std::vector<M::iterator>> cache;
for (auto it = m.begin(); it != m.end(); ++it)
   for (const auto& kv : it->first)
      cache[kv].push_back(it);

现在，您基本上需要获取所有搜索的"kX","vY"对，并找到它们的缓存迭代器的交集：

std::vector<M::key_type::value_type> find_list = { { "k1", "v1" }, { "k2", "v5" } };
std::vector<M::iterator> found;
if (find_list.size() > 0) {
   auto it = find_list.begin();
   std::copy(cache[*it].begin(), cache[*it].end(), std::back_inserter(found));
   while (++it != find_list.end()) {
      const auto& temp = cache[*it];
      found.erase(std::remove_if(found.begin(), found.end(),
            [&temp](const auto& e){ return std::find(temp.begin(), temp.end(), e) == temp.end(); } ),
         found.end());
   }
}

最终输出：

for (const auto& it : found)
   std::cout << it->second << std::endl;

在这种情况下给予value3。

现场演示：https://wandbox.org/permlink/S9Zp8yofSvjfLokc。

请注意，相交步骤的复杂性非常大，因为缓存的迭代器未排序。如果改用指针，则可以对向量进行排序，也可以将指针存储在地图中，这样可以更快地找到交点，例如，使用std::set_intersection。

Answer 5

std::map被实现为具有O（nlgn）查找的平衡二叉树。相反，您需要的是std::unordered_map，它实现为哈希表，即O（1）查找。

现在让我改一下您的措辞，您想：

我们的查询是“给我键包含{“ k1”：“ v1”}的所有键值，它将返回第一个和第三个值。

翻译为：

如果给定的键值对在内部映射中，请将其值还给我。本质上，您需要的是对std :: unordered_map擅长的双重查询。

这是一个代码小精灵，可以解决您使用标准库的问题（不需要特殊代码）

#include <iostream>
#include <unordered_map>
#include <string>

int main() {
  using elemType = std::pair<std::string, std::string>;
  using innerMap = std::unordered_map<std::string, std::string>;
  using myMap = std::unordered_map<std::string, innerMap>;

  auto table = myMap{ { "value1", { {"k1", "v1"}, {"k2", "v2"} } },
                      { "value2", { {"k1", "v3"}, {"k2", "v4"} } },
                      { "value3", { {"k1", "v1"}, {"k2", "v5"} } } };

  //First we set-up a predicate lambda                                                                                                                                                                      
  auto printIfKeyValueFound = [](const myMap& tab, const elemType& query) {
                                // O(n) for the first table and O(1) lookup for each, O(n) total                                                                                                           
                                 for(const auto& el : tab) {
                                   auto it = el.second.find(query.first);
                                   if(it != el.second.end()) {
                                     if(it->second == query.second) {
                                       std::cout << "Element found: " << el.first << "\n";
                                      }
                                    }
                                  }
                                 };

  auto query = elemType{"k1", "v1"};

  printIfKeyValueFound(table, query);

输出：值3，值1

对于任意大小的查询，您可以：

//First we set-up a predicate lambda                                                                                                                                                                      
auto printIfKeyValueFound = [](const myMap& tab, const std::vector<elemType>& query) {
                               // O(n) for the first table and O(n) for the query O(1) search                                                                                                             
                               // O(n^2) total                                                                                                                                                            
                               for(const auto& el : tab) {
                                 bool found = true;
                                 for(const auto& queryEl : query) {
                                   auto it = el.second.find(queryEl.first);
                                   if(it != el.second.end() && it->second != queryEl.second) {
                                       found = false;
                                       break;
                                   }
                                 }
                                 if(found)
                                   std::cout << el.first << "\n";
                                 }
                              };


auto query = std::vector<elemType>{ {"k1", "v1"}, {"k2", "v2"} };

输出值1

键值映射中的部分查找，其中键本身是键值映射

5 个答案: