我有一个字符串,我需要爆炸并获取信息。
示例字符串:
"20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50"
首先,我按,
爆炸字符串并获取
"20' Container 1"
"40' Open Container 1"
"40-45' Closed Container 3"
现在我想要爆炸已爆炸的数组,以便我得到以下格式的结果
array[
0 => [
0 => "20'"
1 => "Container"
2 => "1"
]
1 => [
0 => "40'"
1 => "Open Container"
2 => "1"
]
2 => [
0=> container roll
1=> 10
]
3=> [
0=> container lift
1 => 50
]
]
字符串可能会有所不同,但决定格式相同,例如length type number
其中length
是可选的,
我在做
$pattern = '/([\d-]*\')\s(.*)\s(\d+)/';
foreach (explode(', ', $equipment->chassis_types) as $value) {
preg_match($pattern, $value, $matches); // Match length, type, number
$result[] = array_slice($matches, 1); // Slice with offset 1
$equipment->tokenized = $result;
}
我得到了
Array
(
[0] => Array
(
[0] => 20'
[1] => container
[2] => 10
)
[1] => Array
(
[0] => 40'
[1] => open container
[2] => 10
)
[2] => Array
(
[0] => 40-45'
[1] => closed container
[2] => 20
)
[3] => Array
(
)
[4] => Array
(
)
)
答案 0 :(得分:2)
通过给出的示例,您可以选择
<?php
$string = "20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50";
$regex = "~
(?:(?P<group1>\d+(?:-\d+)?')\h*)?
(?P<group2>(?i:[a-z]+\h?)+)\h+
(?P<group3>\d+(?:'')?)
~x";
if (preg_match_all($regex, $string, $matches, PREG_SET_ORDER)) {
print_r($matches);
}
?>
<小时/>
这会产生:
Array
(
[0] => Array
(
[0] => 20' Container 1
[group1] => 20'
[1] => 20'
[group2] => Container
[2] => Container
[group3] => 1
[3] => 1
)
[1] => Array
(
[0] => 40' Open Container 1
[group1] => 40'
[1] => 40'
[group2] => Open Container
[2] => Open Container
[group3] => 1
[3] => 1
)
[2] => Array
(
[0] => 40-45' Closed Container 3
[group1] => 40-45'
[1] => 40-45'
[group2] => Closed Container
[2] => Closed Container
[group3] => 3
[3] => 3
)
[3] => Array
(
[0] => container roll 10
[group1] =>
[1] =>
[group2] => container roll
[2] => container roll
[group3] => 10
[3] => 10
)
[4] => Array
(
[0] => container lift 50
[group1] =>
[1] =>
[group2] => container lift
[2] => container lift
[group3] => 50
[3] => 50
)
)
<小时/> 核心正则表达式是
(?: # non-capturing group
(?P<group1>\d+(?:-\d+)?')\h* # group1 = digits, 1+ (-other digits), optionally
)? # make the whole group optional
(?P<group2>(?i:[a-z]+\h?)+)\h+ # group2 = [a-zA-Z]+ horizontal whitespaces, no digits
(?P<group3>\d+(?:'')?) # group3 = other digits + '', eventually
答案 1 :(得分:0)
您可以使用*
制作第一个号码和&#39;可选的。
$str = '20\' Container 1, 40\' Open Container 1, 40-45\' Closed Container 3, container roll 10, container lift 50';
preg_match_all('/(\d*\'*)\s([a-zA-Z ]+)(\d+)/', $str, $matches);
var_dump($matches);
这给出了这样的输出:
array(4) {
[0]=>
array(5) {
[0]=>
string(15) "20' Container 1"
[1]=>
string(20) "40' Open Container 1"
[2]=>
string(22) "45' Closed Container 3"
[3]=>
string(18) " container roll 10"
[4]=>
string(18) " container lift 50"
}
[1]=>
array(5) {
[0]=>
string(3) "20'"
[1]=>
string(3) "40'"
[2]=>
string(3) "45'"
[3]=>
string(0) ""
[4]=>
string(0) ""
}
[2]=>
array(5) {
[0]=>
string(10) "Container "
[1]=>
string(15) "Open Container "
[2]=>
string(17) "Closed Container "
[3]=>
string(15) "container roll "
[4]=>
string(15) "container lift "
}
[3]=>
array(5) {
[0]=>
string(1) "1"
[1]=>
string(1) "1"
[2]=>
string(1) "3"
[3]=>
string(2) "10"
[4]=>
string(2) "50"
}
}
要获得更接近您想要的数组,可以使用数组列按照您的喜好对匹配进行分组。
$str = '20\' Container 1, 40\' Open Container 1, 40-45\' Closed Container 3, container roll 10, container lift 50';
preg_match_all('/(\d*\'*)\s([a-zA-Z ]+)(\d+)/', $str, $matches);
unset($matches[0]); // remove full match as it's not needed.
$res =[];
foreach($matches[1] as $key => $val){
$res[] = array_column($matches, $key);
}
var_dump($res);
答案 2 :(得分:0)
假设只有length
可以丢失,您可以尝试使用我从现有模式修改的模式。加上array_filter()
函数可以从每个$matches
$pattern = '/([\d-]*\')?\s?(\D+)\s(\d+)/';
foreach (explode(', ', $equipment->chassis_types) as $value) {
preg_match($pattern, $value, $matches);
$result[] = array_slice(array_filter($matches), 1);
}
$equipment->tokenized = $result;
修改您的模式:
?
在第一个捕获组之后,如果不存在则可以跳过/s?
之后也会跳过第一个空格(.*)
更改为(\D+)
以匹配任何不是数字的字符(假设type
从不包含数字) 注意:我将行$equipment->tokenized = $result;
移到了循环之外,只将其设置一次,而不是在循环中重复设置
答案 3 :(得分:0)
我想我最同意 Erwin 的回答,但尽管这不是一项验证任务,但我喜欢 Jan 的回答在定义范围内的“长度”子字符串方面做得更好,Erwin's answer will match ' 1
. 没有迹象表明制表符或输入字符串中存在换行符,因此文字空间是合适的。用双引号包裹正则表达式意味着不需要对模式中的撇号进行转义。根据记录,Andreas 的模式不正确,因为它未能正确匹配“长度”子字符串,并且在“类型”子字符串中包含不需要的空格。
这是我用来解析提供的输入的内容:(Demo) (Pattern Demo)
$string = "20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50";
preg_match_all(
"~(\d+(?:-\d+)?')? (\D+) (\d+)~",
$string,
$matches,
PREG_SET_ORDER
);
print_r($matches); // use var_export() to show that no spaces are captured
模式说明:
Length
)Type
)Number
)