Question

我获得了一个项目来研究用于存储数据的当前容器的替代方案，以提高其效率。

目前的设计涉及4个嵌套地图 map< string, map< string, map< int, map< string, string> > > >

让我们将每个数据字段命名为Company，Department，ID_of_employee，Name

目前检索给定Company，Dept，ID的员工姓名的时间复杂度为 O （log N ）更确切地说，它涉及三次查找。

现在的空间复杂性不是问题。

我最初的选择是：

使用嵌套对来表示Company，Dept，Id，然后将此嵌套对用作地图的关键字。这似乎不容易阅读。
我考虑使用tuple或struct而不是嵌套对，而我本人在阅读时并没有那么不同。创建包含new struct EmployeeKey，Company，Dept字段的ID后。我可以将其用作Key到map。（我想我必须编写自定义比较而不是运算符）。
使用company+Dept+ID中的连锁密钥，将int转换为string并将其连接起来。然后将此密钥提供给map<ConcatenatedKey, Data>
使用Boost.MultiIndex。即使这似乎是我放弃这个选项的最佳选择，因为我发现它有点复杂。

提供一些必要的信息。此Container通常用于检索最终的嵌套数据，这就是我使用连接键方法的结论。我的问题基本上是，使用这种串联字符串有什么警告吗？这是一个糟糕的设计还是我们应该避免的事情？

根据我的理解，这将改善查找时间，仍保持对数但执行一次而不是四次查找，因此它似乎是一种改进。

Answer 1

由于std::map<>是红黑树，它仍然是二叉树，因此与哈希映射相比，查找速度并不快 - 特别是如果条目数很大。

假设散列传播良好，使用std::unordered_map<> （散列映射）将提供更好的性能。我推荐使用fnv或MurmurHash3，因为它们具有最好的值分布。

现在，谈论嵌套容器 - 你应该从不，永远做这样的事情！整体性能可能非常糟糕，内存使用肯定会非常大，因为它本质上是一个四维RB树：

让我们把它放到上下文中，你有20个公司，每个公司有5个部门，每个部门有12个EmployeeID，每个EmployeeID映射到<Name, some_string>的地图（最后一点似乎有点多余，不要'你觉得呢？）。

每个公司叶节点都是std :: map =＆gt; 20 std :: map instances
每个Department叶节点都是std :: map =＆gt; 20 + 20 * 5 = 120 std :: map instances
每个EmployeeID叶节点都是std :: map =＆gt; 120 + 20 * 5 * 12 = 1320 std :: map instances
每个Name叶子节点都是std :: map =＆gt; 1320 + 20 * 5 * 12 * 1 = 2520 std :: map instances

所以你看，嵌套容器是非常危险的，因为即使使用一个小的数据集，你最终也会得到大量的容器实例。这种表现非常糟糕，特别是当对象被销毁或插入新元素时。

我的建议：使用与std :: unordered_map结合的EmployeeKey结构。这将为您提供良好的查找速度，并且只有一个std :: unordered_map实例。

struct EmployeeKey
{
    int         CompanyID;  // if you want speed, CompanyID is the way to go
    std::string Department;
    int         EmployeeID;
    std::string Name;

    inline bool operator==(const EmployeeKey& key) const {
        return CompanyID != key.CompanyID && ... /* etc */;
    }
};

template<> struct hash<EmployeeKey> {
    size_t operator()(const EmployeeKey& key) const {
        /* perform hash combine here */
    }
};

这应该足以让你入门。最终的布局如下所示：

std::unordered_map<EmployeeKey, std::string> EmployeeData;
// usage:
auto it = EmployeeData.find(selectedEmployee);
if (it != EmployeeData.end())
    it->second = "Good employee";

如果你真的必须通过各种方式'加速'你的查找，你可以记住，如果CompanyID是[0 .. N]的整数，你可以使用std：：向右公司获取快速第一级索引的向量：

std::vector<std::unordered_map<EmployeeKey2, std::string>> EmployeeData;
// usage:
auto& companyMap = EmployeeData[EBuyNLargeCorp]; // or [selectedCompany(0..N)]
auto it = companyMap.find(selectedEmployee);
if (it != companyMap.end())
    it->second = "Good employee!";

如果EmployeeKey2缺少CompanyID字段，selectedCompany将成为向量中的索引。但这只是你为真正关键的性能提升所做的事情。

Answer 2

看起来您忘记使用正确的工具来解决正确的问题。您尝试使用地图模拟数据库。更简单的解决方案是使用真正的数据库，SQLite3易于集成，因为它与文件一起使用。

您将能够以有效的方式查询许多不同的信息。您甚至可以使用外部工具调查数据库。

如果您仍然不想使用DB，请将每个表想象为向量，id是索引。最后的表是一个id值元组的映射，但我不建议，因为获取不同类型的信息会更难。

下面是DB的例子，注意我多年没写SQL，可能有更好的设计，你也可以添加一个表来注册公司的有效部门，并添加约束来捕获无效的注册员工到公司缺失的部门。

SQL Fiddle

SQLite（SQL.js）架构设置：

CREATE TABLE Company(
     id integer primary key autoincrement, 
     name varchar(20) not null unique
);

INSERT INTO Company (name) values ("google");
INSERT INTO Company (name) values ("facebook");

CREATE TABLE Department(
     id integer primary key autoincrement, 
     name varchar(20) not null unique
);

INSERT INTO Department (name) values ( "research");
INSERT INTO Department (name) values ( "development");
INSERT INTO Department (name) values ( "marketing");
INSERT INTO Department (name) values ( "hell");

CREATE TABLE Employee
(
     social_id integer primary key, 
     name varchar(20) not null
);

INSERT INTO Employee  values ( 1,"mark");
INSERT INTO Employee  values ( 2,"john");
INSERT INTO Employee  values ( 3,"david");

CREATE TABLE Assigment(
     emp_id  primary key references Employee(social_id), /* employee have only one job */
     comp_id not null references Company(id), 
     dep_id  not null references Department(id)
);

INSERT INTO Assigment select 1,c.id,d.id from Company c join Department d where (c.name='google' and d.name='hell');
INSERT INTO Assigment select 2,c.id,d.id from Company c join Department d where (c.name='google' and d.name='marketing');
INSERT INTO Assigment select 3,c.id,d.id from Company c join Department d where (c.name='facebook' and d.name='research');

查询1 ：

SELECT c.name,d.name,e.name
    FROM assigment a JOIN company c ON a.comp_id=c.id
    JOIN department d ON d.id=dep_id
    JOIN employee e ON a.emp_id=social_id

<强> Results ：

|     name |      name |  name |
|----------|-----------|-------|
|   google |      hell |  mark |
|   google | marketing |  john |
| facebook |  research | david |

在C ++中使用连接键而不是嵌套Map Container的优点和缺点

2 个答案: