Question

我有一个字符串＆＃34; CPC＆gt; = $ 0（昨天）＆＃34;我想得到数据： CPC，>=，0，Yesterday。但是，符号>=可能会在更多符号之间变化，但始终是比较符号。

$str = "CPC >= $0 (Yesterday)";
preg_match('/(?<metric1>\w+) (?<sign>\w+) $(?<digit>\d+) \(((?<time>\w+))\)/', $str, $matches);
print_r($matches);

这给出了输出：

Array
(
)

编辑：

字符串也可以是：CPC (Link) > $0 (Today)符号前面的括号。当您发布答案时，您是否还可以解释模式中使用的字符？

（粘贴评论......）

我试图在数组中获取CPC (Link)，>，0，Today ---最后一项没有括号。

是的，第一部分和比较运算符的括号可以是：>或<或<=或>=。

Answer 1

有几个问题：

＆gt;，=等不是单词字符（由\ w匹配）。你需要使用 \ n（S（任何非空白字符）代替。
你需要逃避$符号（否则它会尝试匹配结束字符串）。
time周围的（{/ 1}}比您需要的更多（）

请改为尝试：

$regex = '/(?<metric1>\w+(\s\([^)]+\))?)\s+(?<sign>\S+)\s+\$(?<digit>\d+)\s+\((?<time>[^)]+)\)/';
$str = "CPC >= $0 (Yesterday)";
preg_match($regex, $str, $matches);
print_r($matches);
$str = "CPC (Link) > $0 (Today)";
preg_match($regex, $str, $matches);
print_r($matches);

输出：

Array
(
    [0] => CPC >= $0 (Yesterday)
    [metric1] => CPC
    [1] => CPC
    [2] => 
    [sign] => >=
    [3] => >=
    [digit] => 0
    [4] => 0
    [time] => Yesterday
    [5] => Yesterday
)
Array
(
    [0] => CPC (Link) > $0 (Yesterday)
    [metric1] => CPC (Link)
    [1] => CPC (Link)
    [2] =>  (Link)
    [sign] => >
    [3] => >
    [digit] => 0
    [4] => 0
    [time] => Today
    [5] => Today
)

$regex的解释：

(?<metric1>\w+(\s\([^)]+\))?) - captures a word (\w+) followed by an optional set of characters within () into a group called metric
(?<sign>\S+) - captures a sequence of non-whitespace characters (\S+) into a group called sign
\$(?<digit>\d+) - captures a sequence of digits (\d+) following a $ sign into a group called digit
\((?<time>[^)]+) - captures a set of characters within () into a group called time

Answer 2

这是一个适用于您的示例的解决方案：

$str = "CPC >= $0 (Yesterday)";
preg_match_all("/[^\s$)(]+/", $str, $matches);
print_r($matches[0]);
// Array ( [0] => CPC [1] => >= [2] => 0 [3] => Yesterday )

Answer 3

对于metric1，您可以列出要在字符类中匹配的字符，并以空格结尾，并将其作为一组重复。

如果sign部分可以是>或<或<=或>=，您可以使用字符类和可选{{1}匹配}}

对于=部分，你可以捕获在捕获组中美元符号后面的数字，你必须逃避美元符号，否则它的意思是断言行的开头

对于digit部分，您可以捕获捕获组中括号内的所有内容。

(?<metric1>(?:[\w()]+\s)+)(?<sign>[><]=?) \$(?<digit>\d+) $(?<time>[^)]+)$

<强>解释

time命名捕获组(?<metric1>
- metric1在非捕获组中(?:[\w()]+\s)+重复在字符类中匹配的内容后跟一个空格并重复该组一次或多次
(?=关闭群组
)命名捕获组(?<sign>
- sign在字符类中匹配[><]=?或<，后跟可选的>
=关闭小组并匹配空格和美元符号
) \$
- (?<digit>匹配一个或多个数字
\d+关闭群组并匹配空白
)按字面匹配\((?<time>并开始命名捕获组(
- time使用否定的character class
[^)]+关闭小组并按字面意思匹配)\)

Demo

Answer 4

我从不使用命名捕获组，因为它们使得模式更难以读取并且它们使输出数组膨胀。如果要生成命名变量，可以使用list()或Symmetric Array Destructuring。

如果是我的项目，我可能不会将捕获组或变量命名，但如果它使您的代码更具可读性或可理解性，那么这是一个非常高尚的理由。

请记住输出数组中的第一个元素是全字符串匹配，您没有用它。

Pattern Demo

代码：（Demo）

$strings = [
    'CPC >= $0 (Yesterday)',
    'CPC (Link) > $100 (Today)'
];

foreach ($strings as $string) {
    list($metric, $sign, $digit, $time) = preg_match('~([\w ()]+) ([><]=?) \$(\d+) \(([^)]+)\)~', $string, $out) ? array_slice($out, 1) : ['', '', '', ''];  // if fails, use empty strings

    echo "metric: $metric, sign: $sign, digit: $digit, time: $time\n";
    var_export($metric);  // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n";
    var_export($sign);    // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n";
    var_export($digit);   // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n";
    var_export($time);    // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n----------\n";
}

输出：

metric: CPC, sign: >=, digit: 0, time: Yesterday
'CPC'
'>='
'0'
'Yesterday'
----------
metric: CPC (Link), sign: >, digit: 100, time: Today
'CPC (Link)'
'>'
'100'
'Today'
----------

模式细分：

~            #starting pattern delimiter
(            #start of Capture Group #1
  [\w ()]+   #match (as much as possible) 1 or more A-Z, a-z, 0-9, _, space, or parenthesis (in any order)
)            #end of Capture Group #1
 (           #match space then start of Capture Group #2
   [><]=?    #match greater than or less than symbol followed optionally by equals symbol
 )           #end of Capture Group #2
 \$          #match space then a dollar symbol (backslash tells regex to treat the dollar sign literally)
(            #start of Capture Group #3
  \d+        #match one or more digits
)            #end of Capture Group #3
 \(          #match space then opening parenthesis (made literal by backslash)
(            #start of Capture Group #4
  [^)]+      #match one or more characters that are not a closing parenthesis
)            #end of Capture Group #4
\)           #match closing parenthesis literally
~            #end pattern delimiter

preg_match从字符串中提取数据

4 个答案: