如何使用PHP在GCIDE XML中搜索

时间:2012-05-24 15:34:47

标签: php xml

我从website下载了GCIDE(GNU项目的CIDE出版物,英语协作国际词典)。

该包包含各种XML文件。我在Windows PC上运行PHP和Apache。如何使用PHP在这些XML文件中搜索单词及其定义?

2 个答案:

答案 0 :(得分:7)

你的项目引起了我的兴趣,并认为我可能会发现它有用,所以做了一些研究,并找到了下面的code on this page。我运行这个php,目前在我的数据库中有一个功能齐全的字典!

以下是我为完成并运行而做的所有事情(我将XML文件解压缩到包含这些文件的文件夹中名为XML的文件夹中。)

表格的SQL - gcide

CREATE TABLE `gcide` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `word` varchar(255) DEFAULT NULL,
  `definition` text,
  `pos` varchar(50) DEFAULT NULL,
  `fld` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `word` (`word`)
) ENGINE=MyISAM

PHP for gcide XML Import - import_gcide_xml.php

 <?php
    $connection = mysql_connect('localhost', 'root', '') or die('Could not connect to MySQL database. ' . mysql_error());
    $db = mysql_select_db('fiddle',$connection);

    mysql_query('TRUNCATE TABLE gcide') or die(mysql_error());

    $xml = array('xml/gcide_a.xml', 'xml/gcide_b.xml', 'xml/gcide_c.xml', 'xml/gcide_d.xml', 'xml/gcide_e.xml','xml/gcide_f.xml','xml/gcide_g.xml', 'xml/gcide_h.xml', 'xml/gcide_i.xml', 'xml/gcide_j.xml', 'xml/gcide_k.xml', 'xml/gcide_l.xml', 'xml/gcide_m.xml', 'xml/gcide_n.xml', 'xml/gcide_o.xml', 'xml/gcide_p.xml', 'xml/gcide_q.xml', 'xml/gcide_r.xml', 'xml/gcide_s.xml', 'xml/gcide_t.xml', 'xml/gcide_u.xml', 'xml/gcide_v.xml', 'xml/gcide_w.xml', 'xml/gcide_x.xml', 'xml/gcide_y.xml', 'xml/gcide_z.xml');
    $numberoffiles = count($xml);

    for ($i = 0; $i <= $numberoffiles-1; $i++) {
        $xmlfile = $xml[$i];
        // original file contents
        $original_file = @file_get_contents($xmlfile);
        // if file_get_contents fails to open the link do nothing
        if(!$original_file) {}
        else {
            // find words in original file contents
            preg_match_all("/<hw>(.*?)<\/hw>(.*?)<def>(.*?)<\/def>/", $original_file, $results);
            $blocks = $results[0];
            // traverse blocks array
            for ($j = 0; $j <= count($blocks)-1; $j++) {
                preg_match_all("/<hw>(.*?)<\/hw>/", $blocks[$j], $wordarray);
                $words = $wordarray[0];
                $word = addslashes(strip_tags($words[0]));
                $word = preg_replace('{-}', ' ', $word);
                $word = preg_replace("/[^a-zA-Z0-9\s]/", "", $word);
                preg_match_all("/<def>(.*?)<\/def>/", $blocks[$j], $definitionarray);
                $definitions = $definitionarray[0];
                $definition = addslashes(strip_tags($definitions[0]));
                $definition = preg_replace('{-}', ' ', $definition);
                $definition = preg_replace("/[^a-zA-Z0-9\s]/", "", $definition);
                preg_match_all("/<pos>(.*?)<\/pos>/", $blocks[$j], $posarray);
                $poss = $posarray[0];
                $pos = addslashes(strip_tags($poss[0]));
                $pos = preg_replace('{-}', ' ', $pos);
                $pos = preg_replace("/[^a-zA-Z0-9\s]/", "", $pos);
                preg_match_all("/<fld>(.*?)<\/fld>/", $blocks[$j], $fldarray);
                $flds = $fldarray[0];
                $fld = addslashes(strip_tags($flds[0]));
                $fld = preg_replace('{-}', ' ', $fld);
                $fld = preg_replace("/[^a-zA-Z0-9\s]/", "", $fld);

                $insertsql = "INSERT INTO gcide (word, definition, pos, fld) VALUES ('$word', '$definition', '$pos', '$fld')";
                $insertresult = mysql_query($insertsql) or die(mysql_error());

                echo $word. " " . $definition ."\n";
            }
        }
    }
    echo 'Done!';
?>

CSS搜索页面 - gcide.css

body{ font-family:Arial, Helvetica, sans-serif; }
#search_box { padding:4px; border:solid 1px #666666; margin-bottom:15px; width:300px; height:30px; font-size:18px;-moz-border-radius: 6px;-webkit-border-radius: 6px; }
#search_results { display:none;}
.word { font-weight:bold; }
.found { font-weight: bold; }
dl {    font-family:serif;}
dt {    font-weight:bold;}
dd {    font-weight:normal;}
.pos {    font-weight: normal;}
.fld {    margin-right:10px;}

搜索页面的HTML - index.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
        <title>PHP, jQuery search of GCIDE</title>
        <link href="gcide.css" rel="stylesheet" type="text/css"/>
        <link href="http://ajax.googleapis.com/ajax/libs/jqueryui/1.8/themes/ui-lightness/jquery-ui.css" rel="stylesheet" type="text/css"/>
        <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
        <script src="http://ajax.googleapis.com/ajax/libs/jqueryui/1.8/jquery-ui.min.js"></script>
        <script type="text/javascript">
            $(function() {
                $("#search_box").keyup(function() {
                    // getting the value that user typed
                    var searchString    = $("#search_box").val();
                    // forming the queryString
                    var data            = 'search='+ searchString;
                    // if searchString is not empty
                    if(searchString) {
                        // ajax call
                        $.ajax({
                            type: "POST",
                            url: "gcide_search.php",
                            data: data,
                            beforeSend: function(html) { // this happens before actual call
                                $("#results").html('');
                                $("#search_results").show();
                                $(".word").html(searchString);
                            },
                            success: function(html){ // this happens after we get results
                                $("#results").show();
                                $("#results").append(html);
                            }
                        });
                    }
                    return false;
                });
            });
        </script>
    </head>
    <body>
        <div class="ui-widget-content" style="padding:10px;">
            <input id="search_box" class='search_box' type="text" />
            <div id="search_results">Search results for <span class="word"></span></div>
            <dl id="results"></dl>
        </div>
    </body>
</html>

用于jQuery搜索的PHP - gcide_search.php

<?php
    if (isset($_POST['search'])) {
        $db = new pdo("mysql:host=localhost;dbname=fiddle", "root", "");
        // never trust what user wrote! We must ALWAYS sanitize user input
        $word = mysql_real_escape_string($_POST['search']);
        $query = "SELECT * FROM gcide WHERE word LIKE '" . $word . "%' ORDER BY word LIMIT 10";
        $result = $db->query($query);
        $end_result = '';
        if ($result) {
            while ( $r = $result->fetch(PDO::FETCH_ASSOC) ) {
                $end_result                 .= '<dt>' . $r['word'];
                if($r['pos'])   $end_result .= ',&nbsp;<span class="pos">'.$r['pos'].'</span>';
                $end_result                 .= '</dt>';
                $end_result                 .= '<dd>';
                if($r['fld'])   $end_result .= '<span class="fld">('.$r['fld'].')</span>';
                $end_result                 .= $r['definition'];
                $end_result                 .= '</dd>';
            }
        }
        if(!$end_result) {
            $end_result = '<dt><div class="ui-state-highlight ui-corner-all" style="margin-top: 20px; padding: 0 .7em;">
            <p><span class="ui-icon ui-icon-info" style="float: left; margin-right: .3em;"></span>
            No results found.</p>
            </div></dt>';
        }
        echo $end_result;
    }
?>

答案 1 :(得分:1)

我偶然会偶然发现这个PHP and AJAX example - 它可能会让你指向正确的方向,但是如果有这么多数据,你可能会考虑将它导入数据库并使用它的搜索功能 - 这就是他们的设计目标,而性能可能是一个问题,通过XML文件的那么多纯文本。查看this answer进行XML导入。还找到了这个关于importing GCIDE XML的答案。