在PHP中转义elasticsearch特殊字符

时间:2015-11-21 15:37:14

标签: php regex elasticsearch

我想创建一个函数,通过在PHP中的字符前添加\来转义elasticsearch特殊字符。 Elasticsearch使用的特殊字符是: + - =&& || > < ! (){} [] ^" 〜*? :\ /

我对正则表达式并不是很熟悉,但是我发现了一段代码只是删除了特殊的字符,但我更愿意逃避它们,因为它可能是相关的。我使用的代码:

$s_input = 'The next chars should be escaped: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ / Did it work?';
$search_query = preg_replace('/(\+|\-|\=|\&|\||\!|\(|\)|\{|\}|\[|\]|\^|\"|\~|\*|\<|\>|\?|\:|\\\\)/', '', $s_input);

输出:

The next chars should be escaped / Did it work

所以有两个问题:这个代码删除了特殊的字符,而我想用\来转义它们。此外:此代码不会逃脱\。有谁知道如何逃避Elasticsearch特殊字符?

5 个答案:

答案 0 :(得分:5)

您可以将preg_match反向引用一起使用,因为stribizhev已经注意到它(最简单的方式):

$string = "The next chars should be escaped: + - = && || > < ! ( ) { } [ ] ^ \" ~ * ? : \ / Did it work?"; 

function escapeElasticReservedChars($string) {
    $regex = "/[\\+\\-\\=\\&\\|\\!\\(\\)\\{\\}\\[\\]\\^\\\"\\~\\*\\<\\>\\?\\:\\\\\\/]/";
    return preg_replace($regex, addslashes('\\$0'), $string);
}
echo escapeElasticReservedChars($string);

或使用preg_match_callback功能来实现这一目标。感谢回调,您将能够获得当前匹配并进行编辑。

  

将调用并传递匹配元素数组的回调   在主题字符串中。回调应该返回替换   串。这是回调签名:

以下是行动:

<?php 
$string = "The next chars should be escaped: + - = && || > < ! ( ) { } [ ] ^ \" ~ * ? : \ / Did it work?"; 

function escapeElasticSearchReservedChars($string) {
    $regex = "/[\\+\\-\\=\\&\\|\\!\\(\\)\\{\\}\\[\\]\\^\\\"\\~\\*\\<\\>\\?\\:\\\\\\/]/";
    $string = preg_replace_callback ($regex, 
        function ($matches) { 
            return "\\" . $matches[0]; 
        }, $string); 
    return $string;
}
echo escapeElasticSearchReservedChars($string);

输出:The next chars should be escaped\: \+ \- \= \&\& \|\| \> \< \! \( \) \{ \} \[ \] \^ \" \~ \* \? \: \\ \/ Did it work\?

答案 1 :(得分:1)

似乎给出的答案实际上都没有遵循文档,所以这里是另一个正确编码任何不受信任的输入的答案:

/**
 * @param string $s untrusted user input
 * @return string safe string to be used in `query_string` argument to elasticsearch
 */
function escapeForElasticSearch($s)
{
    static $keys = array();
    static $values = array();
    if (!$keys)
    {
        # https://www.elastic.co/guide/en/elasticsearch/reference/5.5/query-dsl-query-string-query.html#_reserved_characters
        $replacements = array(
            "\\" => "\\\\", # must be done first to not double encode later backslashes!
            "+" => "\\+",
            "-" => "\\-",
            "=" => "\\=",
            "&" => "\\&",
            "|" => "\\|",
            ">" => "", # cannot be safely encoded
            "<" => "", # cannot be safely encoded
            "!" => "\\!",
            "(" => "\\(",
            ")" => "\\)",
            "{" => "\\{",
            "}" => "\\}",
            "[" => "\\[",
            "]" => "\\]",
            "^" => "\\^",
            "\"" => "\\\"",
            "~" => "\\~",
            "*" => "\\*",
            "?" => "\\?",
            ":" => "\\:",
            "/" => "\\/",
        );
        $keys = array_keys($replacements);
        $values = array_values($replacements);
    }
    return str_replace($keys, $values, $s);
}

请注意,&| 并不是单独的特殊字符,但正确处理这些字符的奇数比仅对这些字符的每个实例进行编码更困难。

答案 2 :(得分:1)

完全公开,我从未使用过elasticsearch,我的建议不是来自个人经验,甚至也不是用elasticsearch 测试过的。我根据我对正则表达式和字符串操作技能的了解来生成这个建议。如果有人发现漏洞,我很乐意收到您的评论。

我的片段:

  • 首先删除字符串中所有出现的 <>,然后
  • 检查单次出现的保留字符列表中的字符,或紧跟同一个字符的&符号或竖线——所有这些限定字符都用反斜杠转义。

代码:(Demo)

$string = "To be escaped: + - = && || > < ! ( ) { } [ ] ^ \" ~ * ? : \ / triple ||| and split '&<&'"; 

echo escapeElasticSearchReservedChars($string);

function escapeElasticSearchReservedChars(string $string): string
{
    return preg_replace(
        [
            '_[<>]+_',
            '_[-+=!(){}[\]^"~*?:\\/\\\\]|&(?=&)|\|(?=\|)_',
        ],
        [
            '',
            '\\\\$0',
        ],
        $string
    );
}

输出:

To be escaped\: \+ \- \= \&& \||   \! \( \) \{ \} \[ \] \^ \" \~ \* \? \: \\ \/ triple \|\|| and split '\&&'

首先删除 <> 的原因是,有人无法尝试破解替换的设计并尝试传入 |>|,否则会阻止适当的转义两个连续管道(在删除 > 之后)。

答案 3 :(得分:0)

简单的方法是使用单个字符类来匹配 唯一的问题是什么用作分隔符(为了便于阅读)。

使用@作为正则表达式分隔符,其

查找:'@[-+=&|><!(){}[\]^"~*?:\\\/]@'
替换:'\\$0'

但是,如果实际角色已被转义怎么办? 然后怎样呢?

解决方案是找到 Not 转义的那些。

查找:'@(?<!\\\)(?:\\\\\\\)*\K(?:[-+=&|><!(){}[\]^"~*?:/]|\\\(?!\\\))@'
替换:'\\$0'

格式化:

 (?<! \\ )                     # Not an escape behind 
 (?: \\ \\ )*                  # Possible even number of escapeds
 \K                            # Don't include the previous escapes in match
 (?:
      [-+=&|><!(){}[\]^"~*?:/]      # Either 1 of these special characters
   |                              # or,
      \\                            # An escape character that is
      (?! \\ )                      # not followed by escape itself.
 )

答案 4 :(得分:0)

如果有人正在寻找稍微冗长(但可读的)解决方案:

public function escapeElasticsearchValue($searchValue)
{
    $searchValue = str_replace('\\', '\\\\', $searchValue);
    $searchValue = str_replace('*', '\\*', $searchValue);
    $searchValue = str_replace('?', '\\?', $searchValue);
    $searchValue = str_replace('+', '\\+', $searchValue);
    $searchValue = str_replace('-', '\\-', $searchValue);
    $searchValue = str_replace('&&', '\\&&', $searchValue);
    $searchValue = str_replace('||', '\\||', $searchValue);
    $searchValue = str_replace('!', '\\!', $searchValue);
    $searchValue = str_replace('(', '\\(', $searchValue);
    $searchValue = str_replace(')', '\\)', $searchValue);
    $searchValue = str_replace('{', '\\{', $searchValue);
    $searchValue = str_replace('}', '\\}', $searchValue);
    $searchValue = str_replace('[', '\\[', $searchValue);
    $searchValue = str_replace(']', '\\]', $searchValue);
    $searchValue = str_replace('^', '\\^', $searchValue);
    $searchValue = str_replace('~', '\\~', $searchValue);
    $searchValue = str_replace(':', '\\:', $searchValue);
    $searchValue = str_replace('"', '\\"', $searchValue);
    $searchValue = str_replace('=', '\\=', $searchValue);
    $searchValue = str_replace('/', '\\/', $searchValue);

    // < and > can’t be escaped at all. The only way to prevent them from
    // attempting to create a range query is to remove them from the query
    // string entirely
    $searchValue = str_replace('<', '', $searchValue);
    $searchValue = str_replace('>', '', $searchValue);

    return $searchValue;
}