我正在尝试在PHP上使用PCRE匹配一系列文本字符串,并且无法在第一个和第二个之间获得所有匹配。
如果有人想知道为什么在地球上我会想要这样做,那是因为Doc Comments。哦,我希望Zend如何使用native / plugin函数从PHP文件中读取Doc Comments ...
以下示例(普通)文本将用于解决此问题。它始终是纯PHP代码,文件开头只有一个开始标记,没有关闭。您可以假设语法始终是正确的。
<?php
class someClass extends someExample
{
function doSomething($someArg = 'someValue')
{
// Nested code blocks...
if($boolTest){}
}
private function killFurbies(){}
protected function runSomething(){}
}
abstract
class anotherClass
{
public function __construct(){}
abstract function saveTheWhales();
}
function globalFunc(){}
尝试匹配类中的所有方法;我的RegEx根本找不到方法killFurbies()
。让它变得贪婪意味着它只匹配类中的最后一个方法,让它变得懒惰意味着它只匹配第一个方法。
$part = '.*'; // Greedy
$part = '.*?'; // Lazy
$regex = '%class(?:\\n|\\r|\\s)+([a-zA-Z_\\x7f-\\xff][a-zA-Z0-9_\\x7f-\\xff]*)'
. '.*?\{' . $part .'(?:(public|protected|private)(?:\\n|\\r|\\s)+)?'
. 'function(?:\\n|\\r|\\s)+([a-zA-Z_\\x7f-\\xff][a-zA-Z0-9_\\x7f-\\xff'
. ']*)(?:\\n|\\r|\\s)*\\(%ms';
preg_match_all($regex, file_get_contents(__EXAMPLE__), $matches, PREG_SET_ORDER);
var_dump($matches);
结果:
// Lazy:
array(2) {
[0]=>
array(4) {
[0]=>
// Omitted.
[1]=>
string(9) "someClass"
[2]=>
string(0) ""
[3]=>
string(11) "doSomething"
}
[1]=>
array(4) {
[0]=>
// Omitted.
[1]=>
string(12) "anotherClass"
[2]=>
string(6) "public"
[3]=>
string(11) "__construct"
}
}
// Greedy:
array(2) {
[0]=>
array(4) {
[0]=>
// Omitted.
[1]=>
string(9) "someClass"
[2]=>
string(0) ""
[3]=>
string(13) "saveTheWhales"
}
[1]=>
array(4) {
[0]=>
// Omitted.
[1]=>
string(12) "anotherClass"
[2]=>
string(0) ""
[3]=>
string(13) "saveTheWhales"
}
}
我如何匹配所有? :S
任何帮助都会感激不尽,因为我已经觉得这个问题很荒谬,因为我正在输入它。试图回答这样一个问题的人比我更勇敢!
答案 0 :(得分:0)
最好使用token_get_all
获取PHP代码的tokens并迭代它们。可以使用T_DOC_COMMENT
标识PHPDoc style comments令牌。
答案 1 :(得分:0)
错误,您不能只使用token_get_all
解析源代码并查找T_DOC_COMMENT
类型的令牌(从T_COMMENT更改为T_DOC_COMMENT,请参阅Gumnbo的帖子)?
可以找到如何使用此token_get_all
功能的示例here。
答案 2 :(得分:0)
我想出了一个类来提取文件中的类和方法的Doc Comments。感谢所有回答此问题的人,以及其他on matching code blocks。
以下示例的平均基准测试值介于0.00495和0.00505之间。
<?php
$file = 'path/to/libraries/tokenizer.php';
include $file;
$tokenizer = new Tokenizer;
// Start Benchmarking here.
$tokenizer->load($file);
// End Benchmarking here.
// The following will output 'bool(false)'.
var_dump($tokenizer->get_doc('Tokenizer', 'get_tokens'));
// The following will output 'string(18) "/** load method */"'.
Tokenizer(是的,我还没有想到更好的名字......)Class:
<?php
class Tokenizer
{
private $compiled = false, $path = false, $tokens = false, $classes = array();
/** load method */
public function load($path)
{
$path = realpath($path);
if(!file_exists($path) || !function_exists('token_get_all'))
{
return false;
}
$this->compiled = false;
$this->classes = array();
$this->path = $path;
$this->tokens = false;
$this->get_tokens();
$this->get_classes();
$this->class_blocks();
$this->class_functions();
return true;
}
protected function get_tokens()
{
$tokens = token_get_all(file_get_contents($this->path));
$compiled = '';
foreach($tokens as $k => $t)
{
if(is_array($t) && $t[0] != T_WHITESPACE)
{
$compiled .= $k . ':' . $t[0] . ',';
}
else
{
if($t == '{' || $t == '}')
{
$compiled .= $t . ',';
}
}
}
$this->tokens = $tokens;
$this->compiled = trim($compiled, ',');
}
protected function get_classes()
{
if(!$this->compiled)
{
return false;
}
$regex = '%(?:(\\d+)\\:366,)?(?:\\d+\\:(?:345|344|353),)?\\d+\\:352,(\\d+)\\:307,(?:\\d+\\:(?:354|355),\\d+\\:307,)*{%';
preg_match_all($regex, $this->compiled, $classes, PREG_SET_ORDER);
if(is_array($classes))
{
foreach($classes as $class)
{
$this->classes[$this->tokens[$class[2]][1]] = array('token' => $class[2]);
$this->classes[$this->tokens[$class[2]][1]]['doc'] = isset($this->tokens[$class[1]][1]) ? $this->tokens[$class[1]][1] : false;
}
}
}
private function class_blocks()
{
if(!$this->compiled)
{
return false;
}
foreach($this->classes as $class_name => $class)
{
$this->classes[$class_name]['block'] = $this->get_block($class['token']);
}
}
protected function get_block($name_token)
{
if(!$this->compiled || ($pos = strpos($this->compiled, $name_token . ':')) === false)
{
return false;
}
$section= substr($this->compiled, $pos);
$len = strlen($section);
$block = '';
$opening = 1;
$closing = 0;
for($i = 0; $i < $len; $i++)
{
if($section[$i] == '{')
{
$opening++;
}
elseif($section[$i] == '}')
{
$closing++;
if($closing == $opening)
{
break;
}
}
if($opening > 0)
{
$block .= $section[$i];
}
}
return trim($block, ',');
}
protected function class_functions()
{
if(!$this->compiled)
{
return false;
}
foreach($this->classes as $class_name => $class)
{
$regex = '%(?:(\d+)\:366,)?(?:\d+\:(?:344|345),)?(?:\d+\:(?:341|342|343),)?\d+\:333,(\d+)\:307,\{%';
preg_match_all($regex, $class['block'], $functions, PREG_SET_ORDER);
foreach($functions as $function)
{
$function_name = $this->tokens[$function[2]][1];
$this->classes[$class_name]['functions'][$function_name] = array('token' => $function[2]);
$this->classes[$class_name]['functions'][$function_name]['doc'] = isset($this->tokens[$function[1]][1]) ? $this->tokens[$function[1]][1] : false;
$this->classes[$class_name]['functions'][$function_name]['block'] = $this->get_block($function[2]);
}
}
}
public function get_doc($class, $function = false)
{
if(!is_string($class) || !isset($this->classes[$class]))
{
return false;
}
if(!is_string($function))
{
return $this->classes[$class]['doc'];
}
else
{
if(!isset($this->classes[$class]['functions'][$function]))
{
return false;
}
return $this->classes[$class]['functions'][$function]['doc'];
}
}
}
对此有何想法或评论?所有批评都欢迎!
谢谢,mniz。