我正在尝试使用正则表达式(在Java中)从PHP文件中收集所有包含指令。
表达式应仅包含那些文件名表示为非连接字符串文字的表达式。不需要包含常量或变量的表达式。
检测应适用于单引号和双引号,include
- s和require
- s,加上_once
的附加技巧以及最后但并非最不重要的关键字和功能风格的调用。
粗略的输入样本:
<?php
require('a.php');
require 'b.php';
require("c.php");
require "d.php";
include('e.php');
include 'f.php';
include("g.php");
include "h.php";
require_once('i.php');
require_once 'j.php';
require_once("k.php");
require_once "l.php";
include_once('m.php');
include_once 'n.php';
include_once("o.php");
include_once "p.php";
?>
输出:
["a.php","b.php","c.php","d.php","f.php","g.php","h.php","i.php","j.php","k.php","l.php","m.php","n.php","o.php","p.php"]
有什么想法吗?
答案 0 :(得分:7)
使用token_get_all
。这是安全的,不会让你头疼。
如果您需要用户名代码,还有PEAR PHP_Parser。
答案 1 :(得分:5)
要准确地执行此操作,您确实需要完全解析PHP源代码。这是因为文本序列:require('a.php');
可以出现在根本不包含的地方 - 例如注释,字符串和HTML标记。例如,以下不是真正的PHP包含,但将与正则表达式匹配:
<?php // Examples where a regex solution gets false positives:
/* PHP multi-line comment with: require('a.php'); */
// PHP single-line comment with: require('a.php');
$str = "double quoted string with: require('a.php');";
$str = 'single quoted string with: require("a.php");';
?>
<p>HTML paragraph with: require('a.php');</p>
也就是说,如果您对获得一些误报感到满意,那么下面的单一正则表达式解决方案可以很好地从所有PHP包含的变体中抓取所有文件名:
// Get all filenames from PHP include variations and return in array.
function getIncludes($text) {
$count = preg_match_all('/
# Match PHP include variations with single string literal filename.
\b # Anchor to word boundary.
(?: # Group for include variation alternatives.
include # Either "include"
| require # or "require"
) # End group of include variation alternatives.
(?:_once)? # Either one may be the "once" variation.
\s* # Optional whitespace.
( # $1: Optional opening parentheses.
\( # Literal open parentheses,
\s* # followed by optional whitespace.
)? # End $1: Optional opening parentheses.
(?| # "Branch reset" group of filename alts.
\'([^\']+)\' # Either $2{1]: Single quoted filename,
| "([^"]+)" # or $2{2]: Double quoted filename.
) # End branch reset group of filename alts.
(?(1) # If there were opening parentheses,
\s* # then allow optional whitespace
\) # followed by the closing parentheses.
) # End group $1 if conditional.
\s* # End statement with optional whitespace
; # followed by semi-colon.
/ix', $text, $matches);
if ($count > 0) {
$filenames = $matches[2];
} else {
$filenames = array();
}
return $filenames;
}
其他2011-07-24 事实证明OP想要一个 Java 而不是PHP的解决方案。这是一个经过测试的Java程序,几乎完全相同。请注意,我不是Java专家,也不知道如何动态调整数组大小。因此,下面的解决方案(粗略地)设置一个固定大小的数组(100)来保存文件名数组。
import java.util.regex.*;
public class TEST {
// Set maximum size of array of filenames.
public static final int MAX_NAMES = 100;
// Get all filenames from PHP include variations and return in array.
public static String[] getIncludes(String text)
{
int count = 0; // Count of filenames.
String filenames[] = new String[MAX_NAMES];
String filename;
Pattern p = Pattern.compile(
"# Match include variations with single string filename. \n" +
"\\b # Anchor to word boundary. \n" +
"(?: # Group include variation alternatives. \n" +
" include # Either 'include', \n" +
"| require # or 'require'. \n" +
") # End group of include variation alts. \n" +
"(?:_once)? # Either one may have '_once' suffix. \n" +
"\\s* # Optional whitespace. \n" +
"(?: # Group for optional opening paren. \n" +
" \\( # Literal open parentheses, \n" +
" \\s* # followed by optional whitespace. \n" +
")? # Opening parentheses are optional. \n" +
"(?: # Group for filename alternatives. \n" +
" '([^']+)' # $1: Either a single quoted filename, \n" +
"| \"([^\"]+)\" # or $2: a double quoted filename. \n" +
") # End group of filename alternativess. \n" +
"(?: # Group for optional closing paren. \n" +
" \\s* # Optional whitespace, \n" +
" \\) # followed by the closing parentheses. \n" +
")? # Closing parentheses is optional . \n" +
"\\s* # End statement with optional ws, \n" +
"; # followed by a semi-colon. ",
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.COMMENTS);
Matcher m = p.matcher(text);
while (m.find() && count < MAX_NAMES) {
// The filename is in either $1 or $2
if (m.group(1) != null) filename = m.group(1);
else filename = m.group(2);
// Add this filename to array of filenames.
filenames[count++] = filename;
}
return filenames;
}
public static void main(String[] args)
{
// Test string full of various PHP include statements.
String text = "<?php\n"+
"\n"+
"require('a.php');\n"+
"require 'b.php';\n"+
"require(\"c.php\");\n"+
"require \"d.php\";\n"+
"\n"+
"include('e.php');\n"+
"include 'f.php';\n"+
"include(\"g.php\");\n"+
"include \"h.php\";\n"+
"\n"+
"require_once('i.php');\n"+
"require_once 'j.php';\n"+
"require_once(\"k.php\");\n"+
"require_once \"l.php\";\n"+
"\n"+
"include_once('m.php');\n"+
"include_once 'n.php';\n"+
"include_once(\"o.php\");\n"+
"include_once \"p.php\";\n"+
"\n"+
"?>\n";
String filenames[] = getIncludes(text);
for (int i = 0; i < MAX_NAMES && filenames[i] != null; i++) {
System.out.print(filenames[i] +"\n");
}
}
}
答案 2 :(得分:4)
/(?:require|include)(?:_once)?[( ]['"](.*)\.php['"]\)?;/
应该适用于您指定的所有情况,并且仅捕获没有扩展名的文件名
测试脚本:
<?php
$text = <<<EOT
require('a.php');
require 'b.php';
require("c.php");
require "d.php";
include('e.php');
include 'f.php';
include("g.php");
include "h.php";
require_once('i.php');
require_once 'j.php';
require_once("k.php");
require_once "l.php";
include_once('m.php');
include_once 'n.php';
include_once("o.php");
include_once "p.php";
EOT;
$re = '/(?:require|include)(?:_once)?[( ][\'"](.*)\.php[\'"]\)?;/';
$result = array();
preg_match_all($re, $text, $result);
var_dump($result);
要获取您想要的文件名,请阅读$results[1]
我应该指出,我对cweiske的答案也很偏爱,除非你真的只想要用正则表达式进行练习(或者想要用grep做这个),你应该使用标记器。
答案 3 :(得分:1)
以下应该可以很好地运作:
/^(require|include)(_once)?(\(\s+)("|')(.*?)("|')(\)|\s+);$/
你会想要第四个被捕获的小组。
答案 4 :(得分:0)
这对我有用:
preg_match_all('/\b(require|include|require_once|include_once)\b(\(| )(\'|")(.+)\.php(\'|")\)?;/i', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[4];