preg_match从字符串中提取数据

时间:2018-04-22 08:23:00

标签: php regex preg-match

我有一个字符串" CPC> = $ 0(昨天)"我想得到数据: CPC>=0Yesterday。但是,符号>=可能会在更多符号之间变化,但始终是比较符号。

$str = "CPC >= $0 (Yesterday)";
preg_match('/(?<metric1>\w+) (?<sign>\w+) $(?<digit>\d+) \(((?<time>\w+))\)/', $str, $matches);
print_r($matches);

这给出了输出:

Array
(
)

编辑:

字符串也可以是:CPC (Link) > $0 (Today)符号前面的括号。当您发布答案时,您是否还可以解释模式中使用的字符?

(粘贴评论......)

  

我试图在数组中获取CPC (Link)>0Today ---最后一项没有括号。

     

是的,第一部分和比较运算符的括号可以是:><<=>=

4 个答案:

答案 0 :(得分:0)

有几个问题:

  • &gt;,=等不是单词字符(由\ w匹配)。你需要使用 \ n(S(任何非空白字符)代替。
  • 你需要逃避$符号(否则它会尝试匹配结束 字符串)。
  • time周围的({/ 1}}比您需要的更多()

请改为尝试:

$regex = '/(?<metric1>\w+(\s\([^)]+\))?)\s+(?<sign>\S+)\s+\$(?<digit>\d+)\s+\((?<time>[^)]+)\)/';
$str = "CPC >= $0 (Yesterday)";
preg_match($regex, $str, $matches);
print_r($matches);
$str = "CPC (Link) > $0 (Today)";
preg_match($regex, $str, $matches);
print_r($matches);

输出:

Array
(
    [0] => CPC >= $0 (Yesterday)
    [metric1] => CPC
    [1] => CPC
    [2] => 
    [sign] => >=
    [3] => >=
    [digit] => 0
    [4] => 0
    [time] => Yesterday
    [5] => Yesterday
)
Array
(
    [0] => CPC (Link) > $0 (Yesterday)
    [metric1] => CPC (Link)
    [1] => CPC (Link)
    [2] =>  (Link)
    [sign] => >
    [3] => >
    [digit] => 0
    [4] => 0
    [time] => Today
    [5] => Today
)

$regex的解释:

(?<metric1>\w+(\s\([^)]+\))?) - captures a word (\w+) followed by an optional set of characters within () into a group called metric
(?<sign>\S+) - captures a sequence of non-whitespace characters (\S+) into a group called sign
\$(?<digit>\d+) - captures a sequence of digits (\d+) following a $ sign into a group called digit
\((?<time>[^)]+) - captures a set of characters within () into a group called time

答案 1 :(得分:0)

这是一个适用于您的示例的解决方案:

$str = "CPC >= $0 (Yesterday)";
preg_match_all("/[^\s$)(]+/", $str, $matches);
print_r($matches[0]);
// Array ( [0] => CPC [1] => >= [2] => 0 [3] => Yesterday )

答案 2 :(得分:0)

对于metric1,您可以列出要在字符类中匹配的字符,并以空格结尾,并将其作为一组重复。

如果sign部分可以是><<=>=,您可以使用字符类和可选{{1}匹配}}

对于=部分,你可以捕获在捕获组中美元符号后面的数字,你必须逃避美元符号,否则它的意思是断言行的开头

对于digit部分,您可以捕获捕获组中括号内的所有内容。

(?<metric1>(?:[\w()]+\s)+)(?<sign>[><]=?) \$(?<digit>\d+) \((?<time>[^)]+)\)

<强>解释

  • time命名捕获组(?<metric1>
    • metric1在非捕获组中(?:[\w()]+\s)+重复在字符类中匹配的内容后跟一个空格并重复该组一次或多次
  • (?=关闭群组
  • )命名捕获组(?<sign>
    • sign在字符类中匹配[><]=?<,后跟可选的>
  • =关闭小组并匹配空格和美元符号
  • ) \$
    • (?<digit>匹配一个或多个数字
  • \d+关闭群组并匹配空白
  • )按字面匹配\((?<time>并开始命名捕获组(
  • [^)]+关闭小组并按字面意思匹配)\)

Demo

答案 3 :(得分:0)

我从不使用命名捕获组,因为它们使得模式更难以读取并且它们使输出数组膨胀。如果要生成命名变量,可以使用list()Symmetric Array Destructuring

如果是我的项目,我可能不会将捕获组或变量命名,但如果它使您的代码更具可读性或可理解性,那么这是一个非常高尚的理由。

  • 请记住输出数组中的第一个元素是全字符串匹配,您没有用它。

Pattern Demo

代码:(Demo

$strings = [
    'CPC >= $0 (Yesterday)',
    'CPC (Link) > $100 (Today)'
];

foreach ($strings as $string) {
    list($metric, $sign, $digit, $time) = preg_match('~([\w ()]+) ([><]=?) \$(\d+) \(([^)]+)\)~', $string, $out) ? array_slice($out, 1) : ['', '', '', ''];  // if fails, use empty strings

    echo "metric: $metric, sign: $sign, digit: $digit, time: $time\n";
    var_export($metric);  // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n";
    var_export($sign);    // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n";
    var_export($digit);   // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n";
    var_export($time);    // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n----------\n";
}

输出:

metric: CPC, sign: >=, digit: 0, time: Yesterday
'CPC'
'>='
'0'
'Yesterday'
----------
metric: CPC (Link), sign: >, digit: 100, time: Today
'CPC (Link)'
'>'
'100'
'Today'
----------

模式细分:

~            #starting pattern delimiter
(            #start of Capture Group #1
  [\w ()]+   #match (as much as possible) 1 or more A-Z, a-z, 0-9, _, space, or parenthesis (in any order)
)            #end of Capture Group #1
 (           #match space then start of Capture Group #2
   [><]=?    #match greater than or less than symbol followed optionally by equals symbol
 )           #end of Capture Group #2
 \$          #match space then a dollar symbol (backslash tells regex to treat the dollar sign literally)
(            #start of Capture Group #3
  \d+        #match one or more digits
)            #end of Capture Group #3
 \(          #match space then opening parenthesis (made literal by backslash)
(            #start of Capture Group #4
  [^)]+      #match one or more characters that are not a closing parenthesis
)            #end of Capture Group #4
\)           #match closing parenthesis literally
~            #end pattern delimiter