如何将多值WHERE子句拆分为多个基元,以便与PHP

时间:2018-01-01 03:30:55

标签: php pdo

我正在尝试为PDO预处理语句编写查询构建器。

我有一个WHERE语句作为字符串,像;

"title = 'home' and description = 'this is just an example'"
"id = 1 or title = 'home'"
"title = home"
etc...

WHERE语句可能包含用户提供的数据,需要进行清理,从我读过的内容来看,使用预处理语句是一种广泛使用的方法吗?

我需要拆分where字符串来创建一个新的字符串,如;

$where = "title = :title AND description = :description";

和数组一样;

$params = array(':title' => 'home', :description = 'this is just an example');

我很难知道我不知道原始字符串中会传递多少个不同的过滤器。

非常感谢任何有关如何实现这一目标的帮助。

我的函数同时采用上述两个分割基元;

function select($table, $fields = array(), $where = "", $params = array(), $limit = '', $fetchStyle = PDO::FETCH_ASSOC) {
    global $dbc, $dbq;

    if (empty($fields)) {
        $fields = "*";
    } else {
        $fields = implode(', ', $fields);
    }

    if (empty($where)) {
        $where = "1";
    }

    if ($limit != '' && is_int($limit)) {
        $limit_include = "LIMIT $limit";
    }

    //create query
    $query = "SELECT $fields FROM $table WHERE $where $limit_include";

    //prepare statement
    $dbq = $dbc->query($query);
    $dbq->execute($params);

    return $dbq->fetchAll($fetchStyle);
}

1 个答案:

答案 0 :(得分:1)

好的,我为你编写了一个解析器。但首先是一些事情。

这并不像它看起来那么微不足道。每当你允许用户直接在sql中输入“stuff”时你就必须非常小心。所以我使用的这种方法为数据提供了一定程度的卫生。这是因为所有“位”必须匹配正则表达式才能通过。这些都不提供引号,反斜杠或其他对sql注入有用的东西。唯一的例外是封装字符串的正则表达式(单引号内的字符串)。

但我必须强调,这并不能保证通过它传递SQL注入代码是不可能的。我之所以这么说,是因为我花了很少的时间对它进行测试并且测试得很少。要记住的是,该查询字符串的任何部分都容易受到sql注入,而不仅仅是值。如果您允许用户传递类似的内容:

   "title = 'home' and description = 'this is just an example'"

他们可以传递这个:

   ";DROP DATABASE"

现在有防止运行多个查询的保护,但我的观点是简单地进行字符串替换或简单的Regx是不够的。我还在一个“禁止”字样列表中添加了。如果不将它们用单引号括起来,则不能使用这些单词。它们是MySQL中的常见操作,不应出现在WHERE子句中。一些例子是:

  • DROP
  • DELETE
  • SHOW
  • ALTER

etc ...现在因为它们没有在函数parse中的switch语句中定义,它们将被default情况拾取,这会引发异常。

还有很多变化,我尝试覆盖最常见的东西。这些都没有出现在你的例子中。我的意思是这样的:

  • "title = 'home' OR title = 'user'"多次使用同一列(具有不同的值)
  • "title IN('home','user', 'foo', 1, 3)" IN
  • "title IS NOT NULL" NULL
  • 其他操作,您只有=我已将此regx '=|\<|\>|\>=|\<=|\<\>|!=|LIKE'包含在内=<>>=<=<>!=LIKE

现在我确信我错过了一些,但这些应该会给你一些关于如何处理这些事情的例子。这是这种方法的一个好处,就是在添加新令牌并添加一些代码来处理它时,它是非常强大的。因此,您可以根据情况进行调整。

因为它使用while循环,所以它应该处理任意数量的列 - &gt;价值集。

所以这就是我提出的(基于词法分析):

//For debugging
error_reporting(-1);
ini_set('display_errors', 1);
echo "<pre>";

function parse($subject, $tokens)
{
    $types = array_keys($tokens);
    $patterns = [];
    $lexer_stream = [];
    $result = false;
    foreach ($tokens as $k=>$v){
        $patterns[] = "(?P<$k>$v)";
    }
    $pattern = "/".implode('|', $patterns)."/i";
    if (preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE)) {
        //print_r($matches);
        foreach ($matches[0] as $key => $value) {
            $match = [];
            foreach ($types as $type) {
                $match = $matches[$type][$key];
                if (is_array($match) && $match[1] != -1) {
                    break;
                }
            }
            $tok  = [
                'content' => $match[0],
                'type' => $type,
                'offset' => $match[1]
            ];
            $lexer_stream[] = $tok;
        }
        $result = parseTokens( $lexer_stream );
    }
    return $result;
}
function parseTokens( array &$lexer_stream ){

    $column = '';
    $params = [];
    $sql = '';

    while($current = current($lexer_stream)){
        $content = $current['content'];
        $type = $current['type'];
        switch($type){
            case 'T_WHITESPACE':
            case 'T_COMPARISON':
            case 'T_PAREN_OPEN':
            case 'T_PAREN_CLOSE':
            case 'T_COMMA':
            case 'T_SYMBOL':
                $sql .= $content;
                next($lexer_stream);
            break;
            case 'T_COLUMN':
                $column = $content;
                $sql .= $content;
                next($lexer_stream);
            break;
            case 'T_OPPERATOR':
            case 'T_NULL':
                $column = '';
                $sql .= $content;
                next($lexer_stream);
            break;
            case 'T_ENCAP_STRING': 
            case 'T_NUMBER':
                if(empty($column)){
                    throw new Exception('Parse error, value without a column name', 2001);
                }

                $value = trim($content,"'");

                $palceholder = createPlaceholder($column, $value, $params);

                $params[$palceholder] = $value;
                $sql .= $palceholder;
                next($lexer_stream);
            break;
            case 'T_IN':
                $sql .= $content;
                parseIN($column, $lexer_stream, $sql, $params);
            break;
            case 'T_EOF': return ['params' => $params, 'sql' => $sql];

            case 'T_UNKNOWN':
            case '':
            default:
                $content = htmlentities($content);
                print_r($current);
                throw new Exception("Unknown token $type value $content", 2000);
        }
    }
}

function createPlaceholder($column, $value, $params){
    $placeholder = ":{$column}";

    $i = 1;
    while(isset($params[$placeholder])){

        if($params[$placeholder] == $value){
            break;
        }

        $placeholder = ":{$column}_{$i}";
        ++$i;
    }

    return $placeholder;
}

function parseIN($column, &$lexer_stream, &$sql, &$params){
    next($lexer_stream);

    while($current = current($lexer_stream)){
        $content = $current['content'];
        $type = $current['type'];
        switch($type){
            case 'T_WHITESPACE':
            case 'T_COMMA':
                $sql .= $content;
                next($lexer_stream);
            break; 
            case 'T_ENCAP_STRING':
            case 'T_NUMBER':
                if(empty($column)){
                    throw new Exception('Parse error, value without a column name', 2001);
                }

                $value = trim($content,"'");

                $palceholder = createPlaceholder($column, $value, $params);

                $params[$palceholder] = $value;
                $sql .= $palceholder;
                next($lexer_stream);
            break;    
            case 'T_PAREN_CLOSE':
                $sql .= $content;
                next($lexer_stream);
                return;
            break;          
            case 'T_EOL':
                throw new Exception("Unclosed call to IN()", 2003);

            case 'T_UNKNOWN':
            default:
                $content = htmlentities($content);
                print_r($current);
                throw new Exception("Unknown token $type value $content", 2000);
        }
    }
    throw new Exception("Unclosed call to IN()", 2003);
}

/**
 * token should be "name" => "regx"
 * 
 * Order is important
 * 
 * @var array $tokens
 */
$tokens = [
    'T_WHITESPACE'      => '[\r\n\s\t]+',
    'T_ENCAP_STRING'    => '\'.*?(?<!\\\\)\'',
    'T_NUMBER'          => '\-?[0-9]+(?:\.[0-9]+)?',
    'T_BANNED'          => 'SELECT|INSERT|UPDATE|DROP|DELETE|ALTER|SHOW',
    'T_COMPARISON'      => '=|\<|\>|\>=|\<=|\<\>|!=|LIKE',
    'T_OPPERATOR'       => 'AND|OR',
    'T_NULL'            => 'IS NULL|IS NOT NULL',
    'T_IN'              => 'IN\s?\(',
    'T_COLUMN'          => '[a-z_]+',
    'T_COMMA'           => ',',
    'T_PAREN_OPEN'      => '\(',
    'T_PAREN_CLOSE'      => '\)',
    'T_SYMBOL'          => '[`]',
    'T_EOF'             => '\Z',
    'T_UNKNOWN'         => '.+?'
];

$tests = [
    "title = 'home' and description = 'this is just an example'",
    "title = 'home' OR title = 'user'",
    "id = 1 or title = 'home'",
    "title IN('home','user', 'foo', 1, 3)",
    "title IS NOT NULL",
];

/* the loop here is for testing only, obviously call it one time */
foreach ($tests as $test){   
    print_r(parse($test,$tokens));
    echo "\n".str_pad(" $test ", 100, "=", STR_PAD_BOTH)."\n";  
}

输出:

Array
(
    [params] => Array
        (
            [:title] => home
            [:description] => this is just an example
        )

    [sql] => title = :title and description = :description
)

========== title = 'home' and description = 'this is just an example' ==========
Array
(
    [params] => Array
        (
            [:title] => home
            [:title_1] => user
        )

    [sql] => title = :title OR title = :title_1
)

======================= title = 'home' OR title = 'user' =======================
Array
(
    [params] => Array
        (
            [:id] => 1
            [:title] => home
        )

    [sql] => id = :id or title = :title
)

=========================== id = 1 or title = 'home' ===========================
Array
(
    [params] => Array
        (
            [:title] => home
            [:title_1] => user
            [:title_2] => foo
            [:title_3] => 1
            [:title_4] => 3
        )

    [sql] => title IN(:title,:title_1, :title_2, :title_3, :title_4)
)

===================== title IN('home','user', 'foo', 1, 3) =====================
Array
(
    [params] => Array
        (
        )

    [sql] => title IS NOT NULL
)

============================== title IS NOT NULL ===============================

您可以对其进行测试live here

希望它适合你!