将字符串的部分提取到数组php中

时间:2018-01-04 06:30:28

标签: php arrays regex string preg-match

我有一个字符串,我需要爆炸并获取信息。

示例字符串:

"20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50"

首先,我按,爆炸字符串并获取

"20' Container 1"
"40' Open Container 1"
"40-45' Closed Container 3"

现在我想要爆炸已爆炸的数组,以便我得到以下格式的结果

array[
    0 => [
        0 => "20'"
        1 => "Container"
        2 => "1"
        ]
    1 => [
        0 => "40'"
        1 => "Open Container"
        2 => "1"
        ]
    2 => [
          0=> container roll
          1=> 10
         ]
    3=> [
         0=> container lift
         1 => 50
        ]
    ]

字符串可能会有所不同,但决定格式相同,例如length type number其中length是可选的,

我在做

$pattern = '/([\d-]*\')\s(.*)\s(\d+)/';
            foreach (explode(', ', $equipment->chassis_types) as $value) {
                preg_match($pattern, $value, $matches); // Match length, type, number
                $result[] = array_slice($matches, 1);   // Slice with offset 1
                $equipment->tokenized   =   $result;
            }

我得到了

Array
(
    [0] => Array
        (
            [0] => 20'
            [1] => container
            [2] => 10
        )

    [1] => Array
        (
            [0] => 40'
            [1] => open container
            [2] => 10
        )

    [2] => Array
        (
            [0] => 40-45'
            [1] => closed container
            [2] => 20
        )

    [3] => Array
        (
        )

    [4] => Array
        (
        )

)

4 个答案:

答案 0 :(得分:2)

通过给出的示例,您可以选择

<?php

$string = "20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50";

$regex = "~
        (?:(?P<group1>\d+(?:-\d+)?')\h*)?
        (?P<group2>(?i:[a-z]+\h?)+)\h+
        (?P<group3>\d+(?:'')?)
        ~x";

if (preg_match_all($regex, $string, $matches, PREG_SET_ORDER)) {
    print_r($matches);
}
?>

请参阅a demo on regex101.com

<小时/> 这会产生:

Array
(
    [0] => Array
        (
            [0] => 20' Container 1
            [group1] => 20'
            [1] => 20'
            [group2] => Container
            [2] => Container
            [group3] => 1
            [3] => 1
        )

    [1] => Array
        (
            [0] => 40' Open Container 1
            [group1] => 40'
            [1] => 40'
            [group2] => Open Container
            [2] => Open Container
            [group3] => 1
            [3] => 1
        )

    [2] => Array
        (
            [0] => 40-45' Closed Container 3
            [group1] => 40-45'
            [1] => 40-45'
            [group2] => Closed Container
            [2] => Closed Container
            [group3] => 3
            [3] => 3
        )

    [3] => Array
        (
            [0] => container roll 10
            [group1] => 
            [1] => 
            [group2] => container roll
            [2] => container roll
            [group3] => 10
            [3] => 10
        )

    [4] => Array
        (
            [0] => container lift 50
            [group1] => 
            [1] => 
            [group2] => container lift
            [2] => container lift
            [group3] => 50
            [3] => 50
        )

)

<小时/> 核心正则表达式是

(?:                               # non-capturing group
    (?P<group1>\d+(?:-\d+)?')\h*  # group1 = digits, 1+ (-other digits), optionally
)?                                # make the whole group optional
(?P<group2>(?i:[a-z]+\h?)+)\h+    # group2 = [a-zA-Z]+ horizontal whitespaces, no digits
(?P<group3>\d+(?:'')?)            # group3 = other digits + '', eventually

答案 1 :(得分:0)

您可以使用*制作第一个号码和&#39;可选的。

$str = '20\' Container 1, 40\' Open Container 1, 40-45\' Closed Container 3, container roll 10, container lift 50';
preg_match_all('/(\d*\'*)\s([a-zA-Z ]+)(\d+)/', $str, $matches);
var_dump($matches);

这给出了这样的输出:

array(4) {
  [0]=>
  array(5) {
    [0]=>
    string(15) "20' Container 1"
    [1]=>
    string(20) "40' Open Container 1"
    [2]=>
    string(22) "45' Closed Container 3"
    [3]=>
    string(18) " container roll 10"
    [4]=>
    string(18) " container lift 50"
  }
  [1]=>
  array(5) {
    [0]=>
    string(3) "20'"
    [1]=>
    string(3) "40'"
    [2]=>
    string(3) "45'"
    [3]=>
    string(0) ""
    [4]=>
    string(0) ""
  }
  [2]=>
  array(5) {
    [0]=>
    string(10) "Container "
    [1]=>
    string(15) "Open Container "
    [2]=>
    string(17) "Closed Container "
    [3]=>
    string(15) "container roll "
    [4]=>
    string(15) "container lift "
  }
  [3]=>
  array(5) {
    [0]=>
    string(1) "1"
    [1]=>
    string(1) "1"
    [2]=>
    string(1) "3"
    [3]=>
    string(2) "10"
    [4]=>
    string(2) "50"
  }
}

要获得更接近您想要的数组,可以使用数组列按照您的喜好对匹配进行分组。

$str = '20\' Container 1, 40\' Open Container 1, 40-45\' Closed Container 3, container roll 10, container lift 50';
preg_match_all('/(\d*\'*)\s([a-zA-Z ]+)(\d+)/', $str, $matches);
unset($matches[0]); // remove full match as it's not needed.

$res =[];
foreach($matches[1] as $key => $val){
    $res[] = array_column($matches, $key);
}
var_dump($res);

https://3v4l.org/4rGod

答案 2 :(得分:0)

假设只有length可以丢失,您可以尝试使用我从现有模式修改的模式。加上array_filter()函数可以从每个$matches

中删除空元素
$pattern = '/([\d-]*\')?\s?(\D+)\s(\d+)/';
foreach (explode(', ', $equipment->chassis_types) as $value) {
    preg_match($pattern, $value, $matches);
    $result[] = array_slice(array_filter($matches), 1);
}
$equipment->tokenized = $result;

修改您的模式:

  • ?在第一个捕获组之后,如果不存在则可以跳过
  • 如果第一组不存在,
  • /s?之后也会跳过第一个空格
  • (.*)更改为(\D+)以匹配任何不是数字的字符(假设type从不包含数字)

注意:我将行$equipment->tokenized = $result;移到了循环之外,只将其设置一次,而不是在循环中重复设置

答案 3 :(得分:0)

我想我最同意 Erwin 的回答,但尽管这不是一项验证任务,但我喜欢 Jan 的回答在定义范围内的“长度”子字符串方面做得更好,Erwin's answer will match ' 1. 没有迹象表明制表符或输入字符串中存在换行符,因此文字空间是合适的。用双引号包裹正则表达式意味着不需要对模式中的撇号进行转义。根据记录,Andreas 的模式不正确,因为它未能正确匹配“长度”子字符串,并且在“类型”子字符串中包含不需要的空格。

这是我用来解析提供的输入的内容:(Demo) (Pattern Demo)

$string = "20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50";

preg_match_all(
    "~(\d+(?:-\d+)?')? (\D+) (\d+)~",
    $string,
    $matches,
    PREG_SET_ORDER
);

print_r($matches);  // use var_export() to show that no spaces are captured

模式说明:

  1. 匹配一位或多位数字,然后可选地匹配连字符后跟一位或多位数字,然后匹配一个撇号。整个捕获的序列是可选的。 (Length)
  2. 匹配但不捕获一个空格。
  3. 捕获一个或多个非数字字符。 (Type)
  4. 匹配但不捕获一个空格。
  5. 捕获一位或多位数字。 (Number)