PHP / PCRE /正则表达式:剥离搜索词appart

时间:2012-06-01 07:04:09

标签: php regex full-text-search pcre

我尝试将典型的Google搜索字符串删除到其中。 即刺痛可能是:“如何”引擎 - 燃料

所以我想分别得到“how to”和引擎和-fuel

我尝试使用以下preg_match_all,但我也单独得到“如何,这可能会因此而难以处理。

preg_match_all(
     '=(["]{1}[^"]{1,}["]{1})'
    .'|([-]{1}[^ ]{1,}[ ]{1})'
    .'|([^-"]{1}[^ ]{1,}[ ]{1})=si', 
  $filter, 
  $matches,
  PREG_PATTERN_ORDER);

任何人都知道如何做到这一点?

2 个答案:

答案 0 :(得分:2)

尝试:

$q = '"how to" engine -fuel';
preg_match_all('/"[^"]*"|\S+/', $q, $matches);
print_r($matches);

将打印:

Array
(
    [0] => Array
        (
            [0] => "how to"
            [1] => engine
            [2] => -fuel
        )

)

含义:

"[^"]*"    # match a quoted string
|          # OR
\S+        # 1 or more non-space chars

答案 1 :(得分:1)

试试这个

(?i)("[^"]+") +([a-z]+) +(\-[a-z]+)\b

<强>码

if (preg_match('/("[^"]+") +([a-z]+) +(-[a-z]+)\b/i', $subject, $regs)) {
    $howto = $regs[1];
    $engine = $regs[2];
    $fuel = $regs[3];
} else {
    $result = "";
}

<强>解释

"
(?i)        # Match the remainder of the regex with the options: case insensitive (i)
(           # Match the regular expression below and capture its match into backreference number 1
   \"           # Match the character “\"” literally
   [^\"]        # Match any character that is NOT a “\"”
      +           # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \"           # Match the character “\"” literally
)
\           # Match the character “ ” literally
   +           # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(           # Match the regular expression below and capture its match into backreference number 2
   [a-z]       # Match a single character in the range between “a” and “z”
      +           # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\           # Match the character “ ” literally
   +           # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(           # Match the regular expression below and capture its match into backreference number 3
   \-          # Match the character “-” literally
   [a-z]       # Match a single character in the range between “a” and “z”
      +           # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\b          # Assert position at a word boundary
"

希望这有帮助。