在句子中以序列形式获取所有单词

时间:2015-06-15 13:44:21

标签: php regex

例如我有下面的句子

VA Trance Pro-Motion [PartI](December 2014)<4CD>{1337x} TheDanceCube.

我想以下列格式将结果存储到数组中

[1]->VA
[2]->Trance
[3]->Pro-Motion
[4]->[PartI]
[5]->(December 2014) 
[6]-><4CD>
.
.
and so on till full sentence

知道如何实现它吗?

我尝试了类似下面的内容

$final = str_explode(' ',$string);

但它不适用于括号内的东西..我认为它只能通过正则表达式或任何其他简单的函数可用吗?

1 个答案:

答案 0 :(得分:4)

That should work:

((?:\w|-)+|(?:\([^\)]+\))|(?:\{[^\}]+\})|(?:\[[^\]]+\])|(?:<[^>]+>))

with a global flag g (preg_match_all() in PHP)

see example here: https://regex101.com/r/oN3vS2/1

How it works:

This wwould capture all words with - also (but ignoring brackets)

((?:\w|-)+?)

The rest are blocks for each type of brackets, like this for ():

(?:\([^\)]+\)) 

For utf-16 characters use:

preg_match_all('/((?:\w|-)+|(?:\([^\)]+\))|(?:\{[^\}]+\})|(?:\[[^\]]+\])|(?:<[^>]+>))/u', $phrase, $results);

Note:

This code won't stop at the end of the sentence, but there are probably better ways than this regex, to do that (like split('.', $phrase) or explode('.', $phrase) before )