REGEX:用逗号分隔,不是单引号,允许转义引号

时间:2010-11-22 14:09:29

标签: php regex

我正在寻找使用PHP 5中的preg_match_all的正则表达式,它允许我用逗号分割字符串,只要逗号不存在于单引号内,允许转义单引号。示例数据将是:

(some_array, 'some, string goes here','another_string','this string may contain "double quotes" but, it can\'t split, on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')

这应该会产生如下匹配:

(some_array

'some, string goes here'

'another_string'

'this string may contain "double quotes" but, it can\'t split, on escaped single quotes'

 anonquotedstring

 83448545

 1210597346 + '000'

 1241722133 + '000')

我已经尝试了很多很多正则表达式...我现在看起来像这样,虽然它不能正确匹配100%。 (它仍然在单引号内分割一些逗号。)

"/'(.*?)(?<!(?<!\\\)\\\)'|[^,]+/"

4 个答案:

答案 0 :(得分:7)

你试过str_getcsv吗?它完全符合您的需要而无需正则表达式。

$result = str_getcsv($str, ",", "'");

您甚至可以在早于5.3的PHP版本中实现此方法,并使用文档中a comment的此代码段映射到fgetcsv

if (!function_exists('str_getcsv')) {

    function str_getcsv($input, $delimiter = ',', $enclosure = '"', $escape = null, $eol = null) {
        $temp = fopen("php://memory", "rw");
        fwrite($temp, $input);
        fseek($temp, 0);
        $r = fgetcsv($temp, 4096, $delimiter, $enclosure);
        fclose($temp);
        return $r;
    }

}

答案 1 :(得分:2)

在PHP 5.3之后,您可以使用str_getcsv

来避免痛苦
 $data=str_getcsv($input, ",", "'");

举个例子......

$input=<<<STR
(some_array, 'some, string goes here','another_string','this string may contain "double quotes" but it can\'t split on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')
STR;

$data=str_getcsv($input, ",", "'");
print_r($data);

输出此

Array
(
    [0] => (some_array
    [1] => some, string goes here
    [2] => another_string
    [3] => this string may contain "double quotes" but it can\'t split on escaped single quotes
    [4] => anonquotedstring
    [5] => 83448545
    [6] => 1210597346 + '000'
    [7] => 1241722133 + '000')
)

答案 2 :(得分:2)

通过一些后视,你可以得到一些接近你想要的东西:

$test = "(some_array, 'some, string goes here','another_string','this string may contain \"double quotes\" but, it can\'t split, on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')";
preg_match_all('`
(?:[^,\']|
   \'((?<=\\\\)\'|[^\'])*\')*
`x', $test, $result);
print_r($result);

给你这个结果:

Array
(
    [0] => Array
        (
            [0] => (some_array
            [1] => 
            [2] =>  'some, string goes here'
            [3] => 
            [4] => 'another_string'
            [5] => 
            [6] => 'this string may contain "double quotes" but, it can\'t split, on escaped single quotes'
            [7] => 
            [8] =>  anonquotedstring
            [9] => 
            [10] =>  83448545
            [11] => 
            [12] =>  1210597346 + '000'
            [13] => 
            [14] =>  1241722133 + '000')
            [15] => 
        )

    [1] => Array
        (
            [0] => 
            [1] => 
            [2] => e
            [3] => 
            [4] => g
            [5] => 
            [6] => s
            [7] => 
            [8] => 
            [9] => 
            [10] => 
            [11] => 
            [12] => 0
            [13] => 
            [14] => 0
            [15] => 
        )

)

答案 3 :(得分:0)

我在这里使用了一个CSV解析器,这就是它们的用途。

如果你坚持使用正则表达式,你可以使用

preg_match_all(
    '/\s*"    # either match " (optional preceding whitespace),
     (?:\\\\. # followed either by an escaped character
     |        # or
     [^"]     # any character except "
     )*       # any number of times,
    "\s*      # followed by " (and optional whitespace).
    |         # Or: do the same thing for single-quoted strings.
    \s*\'(?:\\\\.|[^\'])*\'\s*
    |         # Or:
    [^,]*     # match anything except commas (i.e. any remaining unquoted strings)
    /x', 
    $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];

但是,正如你所看到的,这是丑陋的,难以维持。使用正确的工具完成工作。