Question

我计划建立一个庞大的数据库。我之前有一个客户端，他的数据库大于100M行。因此，我们假设我们有一个包含100M行的表A，并且有多个包含250行的表。

我想知道哪种方法通常更快（我知道这取决于很多事情）：

根据ID
在小表中包含小表值

例如：

第一个选项：

id  |   data1   |   data2   |   data3   |   table1_foreign_key  |   table2_foreign_key  |   table3_foreign_key
--------------------------------------------------------------------------------------------------------------
1   |   test    |   test    |   test    |   12                  |   34                  |   22
2   |   test    |   test    |   test    |   34                  |   67                  |   63
3   |   test    |   test    |   test    |   43                  |   34                  |   18
4   |   test    |   test    |   test    |   23                  |   21                  |   22
5   |   test    |   test    |   test    |   22                  |   34                  |   22
6   |   test    |   test    |   test    |   22                  |   34                  |   13
7   |   test    |   test    |   test    |   23                  |   54                  |   12
8   |   test    |   test    |   test    |   11                  |   57                  |   43
9   |   test    |   test    |   test    |   3                   |   34                  |   22

在这里，我将基于ID将所有这些小表连接到大表。例如，我在这里存储城市，国家，设备等。

第二选项：

id  |   data1   |   data2   |   data3   |   table1_foreign_key  |   table2_foreign_key  |   table3_foreign_key
--------------------------------------------------------------------------------------------------------------
1   |   test    |   test    |   test    |   Oklahoma            |   sample_text         |   sample_text
2   |   test    |   test    |   test    |   New York            |   sample_text         |   sample_text
3   |   test    |   test    |   test    |   New York            |   sample_text         |   sample_text
4   |   test    |   test    |   test    |   New York            |   sample_text         |   sample_text
5   |   test    |   test    |   test    |   Washington          |   sample_text         |   sample_text
6   |   test    |   test    |   test    |   Mitchigan           |   sample_text         |   sample_text
7   |   test    |   test    |   test    |   Oklahoma            |   sample_text         |   sample_text
8   |   test    |   test    |   test    |   Kansas              |   sample_text         |   sample_text
9   |   test    |   test    |   test    |   Dallas              |   sample_text         |   sample_text

在第二个选项中，没有JOIN，但数据将包含在主大表中。每列的预期数据大小将类似于2-20个字符。

问题：

鉴于我们拥有相同的环境并具有适当的索引，上述哪个选项可能会更快？建议采用哪种方法？（我的客户希望在此数据库和表格中存储点击和点击数据。）

Answer 1

因为它是一对多的＆＃34;关系，我会将它们存储在一个单独的表中。 SQL服务器查询优化器（引擎盖下）将能够足够快地解析250条记录，以至于它不应该成为一个问题。此外，根据较小表中值的长度，您将通过不存储数亿次额外的时间来节省存储空间。但是，如果报告性能至关重要，您可以选择将它们存储在一个“扁平化”中。 table - 就像数据仓库结构一样，没有连接。这肯定会更快，但你会牺牲存储空间和结构良好的关系数据库。

所有这些都说，我会选择选项1.但是你应该能够轻松地将数据存储在一个新的表格中，选择2格式 - 对它们进行查询 - 然后自己衡量性能。我希望它不会有太大差别，特别是考虑到你的小桌子的容量。

Answer 2

一般来说，第二种方法肯定更快：基本上，定位记录往往比检索更昂贵。

虽然这里有两件事：第一，显然，你放弃了（相关的）数据一致性执行;第二，你的特殊情况可能不适合“一般来说”。

但无论如何，像这样的非规范化现在被广泛采用。特别是对于所谓的“NoSQL”解决方案，但是被意识对待，它也适用于RDBMS。

我建议你：

1）找出有关数据库使用的潜在用例，尤其是相关数据的变更范围，而不仅仅是查询部分

2）安排一个PoC，实现两种方法＆amp;用数字证明它。

MySQL JOIN在1个大表和多个小表

2 个答案: