PHP正则表达式创建一个数组

时间:2010-11-08 20:56:46

标签: php regex

我想在php中使用正则表达式来区分字符串中的单词和短语。短语将用引号分隔,包括双引号和单引号。正则表达式还必须考虑单词内的单引号(即国家/地区)。

示例字符串:

The nation's economy 'is really' poor, but "might be getting" better.

我希望php使用正则表达式将这种类型的字符串分成数组,如下所示:

Array
(
    [0] => "The"
    [1] => "nation's"
    [2] => "economy"
    [3] => "is really"
    [4] => "poor"
    [5] => "but"
    [6] => "might be getting"
    [7] => "better"

)

完成此操作的PHP代码是什么?感谢。

2 个答案:

答案 0 :(得分:2)

在正则表达式上使用preg_match_all

(?<![\w'"])(?:['"][^'"]+['"]|[\w']+)(?![\w'"])

示例:http://www.ideone.com/SiG0V

preg_match_all(
  '/(?<![\w\'"])(?:[\'"][^\'"]+[\'"]|[\w\']+)(?![\w\'"])/', 
  "The nation's economy 'is really' poor, but \"might be getting\" better.",
  $matches
);

print_r($matches[0]);

(请注意,这不会识别出有问题的词,因为它没有在问题中指定。)

答案 1 :(得分:0)

$str = <<< END
The nation's economy 'is really' poor, but "might be getting" better.
END;
$str = ' ' . $str . ' '; // add surrounding spaces to make things easier

$regex = '/(?<=\s)(".*?"|\'.*?\'|.*?)(?=\s)/';

preg_match_all($regex, $str, $matches);

// strip commas and surrounding quotes from resulting words
$words = $matches[0];
foreach ($words as &$word)
  $word = trim($word, ' ,\'"');

print_r($words);