Question

我构建了一个生成随机句子的逻辑。为此我有一个包含三元组的~1.000.000 entires的数据库表。

目前的逻辑是：

获取首字母
根据第一个单词
继续，直到条目匹配结束标志

在php中它看起来像这样

while($i < 30 && $last['three'] != '[end]') {
  $last = getDBentry($mysqli, $last);
  if($last['three'] != '[end]') {
    $string .= ' ' . $last['three'];
  }
  $i++;
}

我把它限制在最大值。 30但即使只有10个单词，这也需要大约15秒。是否有最佳实践或好方法可以更好地处理这些数据？

修改

function getDBentry () {
...
$key = $last['two'].$last['three'];

if($single) {
    $sql = "SELECT * FROM trigrams WHERE gramkey = '$key'";
} else {
    $sql = "SELECT * FROM trigrams WHERE gramkey = '$key' AND amount > 1";
}

$matches = array();

if ($result = $mysqli->query($sql)) { 
    if($result->num_rows === 0 && $single) {
        die('error no result');
    }

    if($result->num_rows === 0) {
        return getDBentry($mysqli, $last, true);
    }

    while($obj = $result->fetch_object()){ 
        array_push($matches, array('one' => $obj->one, 'two'=>$obj->two, 'three'=>$obj->three, 'amount'=>$obj->amount, 'gramkey'=>$obj->gramkey));
    } 
} else {
    die('error');
}

...

我采取了与主题相关的重要部分

表结构是

id，gramkey，one，two，three，amount - 其中一个，两个三个是单个单词，而gramkey由一个和两个解析为单个字符串以便于访问

Answer 1

正如AlexBlex在评论中提到的那样，解决方案可以在mysql文档中找到。

通过为col gramkeys添加索引，性能提升绝对是疯狂的。从15秒到0.1秒。

编辑：显示创建表

CREATE TABLE `trigrams` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`gramkey` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`one` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`two` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`three` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`amount` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `gramkey` (`gramkey`(255))
) ENGINE=InnoDB AUTO_INCREMENT=1055131 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

如何获得未定义的SQL查询数量，这些查询依赖于彼此以便执行良好

1 个答案: