我想在php中使用正则表达式来区分字符串中的单词和短语。短语将用引号分隔,包括双引号和单引号。正则表达式还必须考虑单词内的单引号(即国家/地区)。
示例字符串:
The nation's economy 'is really' poor, but "might be getting" better.
我希望php使用正则表达式将这种类型的字符串分成数组,如下所示:
Array
(
[0] => "The"
[1] => "nation's"
[2] => "economy"
[3] => "is really"
[4] => "poor"
[5] => "but"
[6] => "might be getting"
[7] => "better"
)
完成此操作的PHP代码是什么?感谢。
答案 0 :(得分:2)
在正则表达式上使用preg_match_all
:
(?<![\w'"])(?:['"][^'"]+['"]|[\w']+)(?![\w'"])
示例:http://www.ideone.com/SiG0V
preg_match_all(
'/(?<![\w\'"])(?:[\'"][^\'"]+[\'"]|[\w\']+)(?![\w\'"])/',
"The nation's economy 'is really' poor, but \"might be getting\" better.",
$matches
);
print_r($matches[0]);
(请注意,这不会识别出有问题的词,因为它没有在问题中指定。)
答案 1 :(得分:0)
$str = <<< END
The nation's economy 'is really' poor, but "might be getting" better.
END;
$str = ' ' . $str . ' '; // add surrounding spaces to make things easier
$regex = '/(?<=\s)(".*?"|\'.*?\'|.*?)(?=\s)/';
preg_match_all($regex, $str, $matches);
// strip commas and surrounding quotes from resulting words
$words = $matches[0];
foreach ($words as &$word)
$word = trim($word, ' ,\'"');
print_r($words);