PHP preg_split() - 不要在''之间分隔空格

时间:2018-04-17 10:45:10

标签: php preg-split

我有这个字符串:

$string = "My name is Emma and i have a dillemma, what's the distance between 'New York' and 'Athene' ?";

我正在使用此代码按空格和一些运算符(=,<,>,!=,> =,< =,<>)拆分此字符串:

$split = preg_split('/\s+|(,|[<>!]?=|<>?|>)/', $string, null, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

现在这个分裂的结果就是这个数组:

Array
(
    [0] => My
    [1] => name
    [2] => is
    [3] => Emma
    [4] => and
    [5] => i
    [6] => have
    [7] => a
    [8] => dillemma
    [9] => ,
    [10] => what's
    [11] => the
    [12] => distance
    [13] => between
    [14] => 'New
    [15] => York'
    [16] => and
    [17] => 'Athene'
    [18] => ?
)

现在我唯一的问题是我希望''之间的空格不被分割,而是在分割之后删除'',在上面这个例子中你可以看到'纽约'被分成:

[14] => 'New
[15] => York'

我期望的结果是:

[14] => New York

还有'雅典娜',我希望它是:

[16] => Athene

基本上,上面的数组应如下所示:

Array
(
    [0] => My
    [1] => name
    [2] => is
    [3] => Emma
    [4] => and
    [5] => i
    [6] => have
    [7] => a
    [8] => dillemma
    [9] => ,
    [10] => what's
    [11] => the
    [12] => distance
    [13] => between
    [14] => New York
    [15] => and
    [16] => Athena
    [17] => ?
)

是的,这两个城市之间的距离是4,925英里或7925公里:D

谢谢! :d

2 个答案:

答案 0 :(得分:3)

正则表达式

(?:\'([^\']*[\'s]?)\'|\"([^\"]*)\")|[^\s,<>=!]+|(?:,|[<>!]?=|<>?|>)

您可以在此处查看匹配项:https://regex101.com/r/LkHnHt/3

PHP代码

$text = "My name is Emma and i have a dillemma, what's the distance between 'New York' and 'Athene' ?";
preg_match_all('/(?:\'([^\']*[\'s]?)\'|\"([^\"]*)\")|[^\s,<>=!]+|(?:,|[<>!]?=|<>?|>)/', $text, $matches);
foreach (array_filter($matches[1]) as $k => $v)
    $matches[0][$k] = $v;

结果

Array
(
    [0] => My
    [1] => name
    [2] => is
    [3] => Emma
    [4] => and
    [5] => i
    [6] => have
    [7] => a
    [8] => dillemma
    [9] => ,
    [10] => what's
    [11] => the
    [12] => distance
    [13] => between
    [14] => New York pop
    [15] => and
    [16] => Athene
    [17] => ?
)

Array
(
    [0] => age
    [1] => <
    [2] => 21
    [3] => ,
    [4] => length
    [5] => >
    [6] => 10
    [7] => ,
    [8] => height
    [9] => <>
    [10] => 10
    [11] => ,
    [12] => width
    [13] => !=
    [14] => 100
    [15] => ,
    [16] => name
    [17] => =
    [18] => Emma Einarsson
    [19] => or
    [20] => it
    [21] => can
    [22] => be
    [23] => words
    [24] => time
    [25] => >=
    [26] => 10
    [27] => ,
    [28] => clouds
    [29] => <=
    [30] => 4
)

注意,所有捕获的数据保存在数组$匹配[0]

答案 1 :(得分:0)

如果我理解了问题要求(在阅读了问题和许多评论之后),唯一棘手的问题是保留单引号子串。

你想要孤立:

  1. 单引号包装的子字符串,可能包含空格。
  2. 可能包含撇号(单引号)的字词
  3. 数字
  4. 五个特定运营商:<>, = ,?`
  5. 模式:~\B'\K(?:[^']+)|\b[a-z']+\b|\d+|[<>!=?]+~i

    带有测试电池的代码(Demo

    $strings = [
        "age<21,length>10,height<>10,width!=100,name='Emma Einarsson' or it can be words time>=10,clouds<=4",
        "age < 21, length > 10, height <> 10, width != 100, name = 'Emma Einarsson' or it can be words time >= 10, clouds <= 4",
        "My name is Emma and i have a dillemma, what's the distance between 'New York' and 'Athene' ?",
        "'New York' and London at the start and end  with Paris and 'Los Angeles'"
    ];
    
    foreach ($strings as $string) {
        var_export(preg_match_all("~\B'\K(?:[^']+)|\b[a-z']+\b|\d+|[<>!=?]+~i", $string, $out) ? $out[0] : 'fail');
        echo "\n";
    }
    

    Pattern Demo

    模式细分:

    ~                 #start of pattern delimiter
    \B'\K(?:[^']+)    #match a single-quote not preceded by [a-zA-Z0-9_], then restart the fullstring match using (\K), then match one or more non-single quote characters
    |                 #OR
    \b[a-z']+\b       #match one or more letters and apostrophes 
    |                 #OR
    \d+               #match one or more digits
    |                 #OR
    [<>!=?]+          #match one or more of your listed operators/symbols
    ~                 #end of pattern delimiter
    i                 #pattern modifier - make whole pattern case-insensitive
    

    根据您的示例输入字符串,您可以从技术上删除我的图案中的两个\b(字边界标记)以提高图案效率,但我将它们留在最高精度。