Question

有一个静态旧网站，html上有50页。问题是如何实现不太快速的搜索？哪种方式看？我还在php上编写了一个脚本，它只是在文件中搜索文本，但它的工作原理很慢，有一些方法可以为页面索引或类似的东西。

<?php

ini_set('max_execution_time', 900);

if(!isset($_GET['s'])) {
    die('You must define a search term!');
}

$search_in = array('html', 'htm');
$search_dir = '.';
$countWords = 15;

$files = list_files($search_dir);
$search_results = array();
foreach($files as $file){
    $contents = file_get_contents($file);
    preg_match_all("/\<p\>(.*)".$_GET['s']."(.*)\<\/p\>/i", $contents, $matches, PREG_SET_ORDER);
    foreach($matches as $match){
        $match[1] = trim_result($match[1]);
        $match[2] = trim_result($match[2], true);
        $match[1] .= '<span style="background: #ffff00;">';
        $match[2] = '</span>'.$match[2];

        preg_match("/\<title\>(.*)\<\/title\>/", $contents, $matches2);
        $search_results[] = array($file, $match[1].$_GET['s'].$match[2], $matches2[1]);
    }
}

?>

    <html>
    <head>
        <title>Search results</title>
    </head>
    <body>
    <?php foreach($search_results as $result) :?>
        <div>
            <h3><a href="<?php echo $result[0]; ?>"><?php echo $result[2]; ?></a></h3>
            <p><?php echo $result[1]; ?></p>
        </div>
    <?php endforeach; ?>
    </body>
    </html>

<?php
function list_files($dir){
    global $search_in;

    $result = array();
    if(is_dir($dir)){
        if($dh = opendir($dir)){
            while (($file = readdir($dh)) !== false) {
                if(!($file == '.' || $file == '..')){
                    $file = $dir.'/'.$file;
                    if(is_dir($file) && $file != './.' && $file != './..'){
                        $result = array_merge($result, list_files($file));
                    }
                    else if(!is_dir($file)){
                        if(in_array(get_file_extension($file), $search_in)){
                            $result[] = $file;
                        }
                    }
                }
            }
        }
    }
    return $result;
}

function get_file_extension($filename){
    $result = '';
    $parts = explode('.', $filename);
    if(is_array($parts) && count($parts) > 1){
        $result = end($parts);
    }
    return $result;
}

function trim_result($text, $start = false){
    $words = split(' ', strip_tags($text));
    if($start){
        $words = array_slice($words, 0, $countWords);
    }
    else{
        $start = count($words) - $countWords;
        $words = array_slice($words, ($start < 0 ? 0 : $start), $countWords);
    }
    return implode(' ', $words);
}

?>

Answer 1

加快搜索速度的最佳方法是：

使用DOM解析器解析所有文件并提取内容。

将此内容写入sqlite数据库（仅50页不需要MYSQL）

然后用简单的sql语句组织实时搜索。

Answer 2

这不是你要在运行时运行的脚本解决的问题。

您需要将某些内容预先解析为一个可以快速搜索的。

一种简单的方法是将其全部解析为文本或JSON文件。然后，您可以加载该文本文件，搜索您的字符串，然后相应地处理它。

更优雅的方法是使用SQL数据库（MySQL，SQLite，SQL Server等）或NoSQL数据库（Mongo，Cassandra等）来存储信息，然后针对它运行查询。

可能最好的解决方案是使用Solr来进行正确的搜索。它会带来最好的结果（以及大量的微调），但可能会因你的需求而过度。

如何在静态网站中进行搜索？

2 个答案: