我想将用户输入的FTS查询转换为MySQL的WHERE子句。所以功能将类似Gmail的搜索。因此用户可以输入:
from:me AND (to:john OR to:jenny) dinner
虽然我认为它不重要,但表结构将类似于:
Message
- id
- from
- to
- title
- description
- time_created
MessageComment
- id
- message_id
- comment
- time_created
由于这是一个常见问题,我认为可能已有解决方案。有没有?
P.S。有一个类似的问题,如here,但它适用于SQL Server。
答案 0 :(得分:3)
以下代码由Tokenizer,Token和QueryBuilder类组成。 它可能不是有史以来最优雅的解决方案,但它确实可以满足您的要求:
<?
// QueryBuilder Grammar:
// =====================
// SearchRule := SimpleSearchRule { KeyWord }
// SimpleSearchRule := Expression | SimpleSearchRule { 'OR' Expression }
// Expression := SimpleExpression | Expression { 'AND' SimpleExpression }
// SimpleExpression := '(' SimpleSearchRule ')' | FieldExpression
$input = 'from:me AND (to:john OR to:jenny) dinner party';
$fieldMapping = array(
'id' => 'id',
'from' => 'from',
'to' => 'to',
'title' => 'title',
'description' => 'description',
'time_created' => 'time_created'
);
$fullTextFields = array('title','description');
$qb = new QueryBuilder($fieldMapping, $fullTextFields);
try {
echo $qb->parseSearchRule($input);
} catch(Exception $error) {
echo 'Error occurred while parsing search query: <br/>'.$error->getMessage();
}
class Token {
const KEYWORD = 'KEYWORD',
OPEN_PAR='OPEN_PAR',
CLOSE_PAR='CLOSE_PAR',
FIELD='FIELD',
AND_OP='AND_OP',
OR_OP='OR_OP';
public $type;
public $chars;
public $position;
function __construct($type,$chars,$position) {
$this->type = $type;
$this->chars = $chars;
$this->position = $position;
}
function __toString() {
return 'Token[ type='.$this->type.', chars='.$this->chars.', position='.$this->position.' ]';
}
}
class Tokenizer {
private $tokens = array();
private $input;
private $currentPosition;
function __construct($input) {
$this->input = trim($input);
$this->currentPosition = 0;
}
/**
* @return Token
*/
function getToken() {
if(count($this->tokens)==0) {
$token = $this->nextToken();
if($token==null) {
return null;
}
array_push($this->tokens, $token);
}
return $this->tokens[0];
}
function consumeToken() {
$token = $this->getToken();
if($token==null) {
return null;
}
array_shift($this->tokens);
return $token;
}
protected function nextToken() {
$reservedCharacters = '\:\s\(\)';
$fieldExpr = '/^([^'.$reservedCharacters.']+)\:([^'.$reservedCharacters.']+)/';
$keyWord = '/^([^'.$reservedCharacters.']+)/';
$andOperator = '/^AND\s/';
$orOperator = '/^OR\s/';
// Remove whitespaces ..
$whiteSpaces = '/^\s+/';
$remaining = substr($this->input,$this->currentPosition);
if(preg_match($whiteSpaces, $remaining, $matches)) {
$this->currentPosition += strlen($matches[0]);
$remaining = substr($this->input,$this->currentPosition);
}
if($remaining=='') {
return null;
}
switch(substr($remaining,0,1)) {
case '(':
return new Token(Token::OPEN_PAR,'(',$this->currentPosition++);
case ')':
return new Token(Token::CLOSE_PAR,')',$this->currentPosition++);
}
if(preg_match($fieldExpr, $remaining, $matches)) {
$token = new Token(Token::FIELD, $matches[0], $this->currentPosition);
$this->currentPosition += strlen($matches[0]);
} else if(preg_match($andOperator, $remaining, $matches)) {
$token = new Token(Token::AND_OP, 'AND', $this->currentPosition);
$this->currentPosition += 3;
} else if(preg_match($orOperator, $remaining, $matches)) {
$token = new Token(Token::OR_OP, 'OR', $this->currentPosition);
$this->currentPosition += 2;
} else if(preg_match($keyWord, $remaining, $matches)) {
$token = new Token(Token::KEYWORD, $matches[0], $this->currentPosition);
$this->currentPosition += strlen($matches[0]);
} else throw new Exception('Unable to tokenize: '.$remaining);
return $token;
}
}
class QueryBuilder {
private $fieldMapping;
private $fulltextFields;
function __construct($fieldMapping, $fulltextFields) {
$this->fieldMapping = $fieldMapping;
$this->fulltextFields = $fulltextFields;
}
function parseSearchRule($input) {
$t = new Tokenizer($input);
$token = $t->getToken();
if($token==null) {
return '';
}
$token = $t->getToken();
if($token->type!=Token::KEYWORD) {
$searchRule = $this->parseSimpleSearchRule($t);
} else {
$searchRule = '';
}
$keywords = '';
while($token = $t->consumeToken()) {
if($token->type!=Token::KEYWORD) {
throw new Exception('Only keywords allowed at end of search rule.');
}
if($keywords!='') {
$keywords .= ' ';
}
$keywords .= $token->chars;
}
if($keywords!='') {
$matchClause = 'MATCH (`'.(implode('`,`',$this->fulltextFields)).'`) AGAINST (';
$keywords = $matchClause.'\''.mysql_real_escape_string($keywords).'\' IN BOOLEAN MODE)';
if($searchRule=='') {
$searchRule = $keywords;
} else {
$searchRule = '('.$searchRule.') AND ('.$keywords.')';
}
}
return $searchRule;
}
protected function parseSimpleSearchRule(Tokenizer $t) {
$expressions = array();
do {
$repeat = false;
$expressions[] = $this->parseExpression($t);
$token = $t->getToken();
if($token->type==Token::OR_OP) {
$t->consumeToken();
$repeat = true;
}
} while($repeat);
return implode(' OR ', $expressions);
}
protected function parseExpression(Tokenizer $t) {
$expressions = array();
do {
$repeat = false;
$expressions[] = $this->parseSimpleExpression($t);
$token = $t->getToken();
if($token->type==Token::AND_OP) {
$t->consumeToken();
$repeat = true;
}
} while($repeat);
return implode(' AND ', $expressions);
}
protected function parseSimpleExpression(Tokenizer $t) {
$token = $t->consumeToken();
if($token->type==Token::OPEN_PAR) {
$spr = $this->parseSimpleSearchRule($t);
$token = $t->consumeToken();
if($token==null || $token->type!=Token::CLOSE_PAR) {
throw new Exception('Expected closing parenthesis, found: '.$token->chars);
}
return '('.$spr.')';
} else if($token->type==Token::FIELD) {
$fieldVal = explode(':', $token->chars,2);
if(isset($this->fieldMapping[$fieldVal[0]])) {
return '`'.$this->fieldMapping[$fieldVal[0]].'` = \''.mysql_real_escape_string($fieldVal[1]).'\'';
}
throw new Exception('Unknown field selected: '.$token->chars);
} else {
throw new Exception('Expected opening parenthesis or field-expression, found: '.$token->chars);
}
}
}
?>
更合适的解决方案是首先构建一个解析树,然后在进行进一步分析之后将其转换为查询。
答案 1 :(得分:0)
您的问题分为两部分
第一个是相当困难的主题。快速搜索没有发现任何等同于你想要的东西。你可以单独使用那个
在问题1正确之前,不要打扰问题2。
而不是创建一个可以处理您提出的查询语法的解析器,例如: from:me AND (to:john OR to:jenny) dinner
也许一个简单的形式可能就是答案。提供用户搜索的选项列表。
通过这种方式,您可以启动并运行服务,并在未来的修订版攻击中,更难以解决如何创建解析器以执行您想要的操作。
在做第2部分时要非常小心,以防止SQL注入攻击。例如,不要直接从查询中获取表名,而是使用查找。
不是你想要的答案,但我不知道你是否会找到一个开箱即用的答案。更好地定义你的问题是线索。和谷歌是你的朋友。
DC
你可能想看......
http://code.google.com/p/xerxes-portal/source/browse/trunk/lib/Xerxes/QueryParser.php?r=1205
另外
http://www.cmsmadesimple.org/api/class_zend___search___lucene___search___query_parser.html
上面链接到2个非常不同的解析器实现(第二个链接打破了stackoverflow,所以我把它编码了)
DC