我正在使用PHP中的preg_match_all对HTML页面进行webscraping。这就是我要抓的东西:
<script>
function fsb38(x) {
var b=new
Array(98,100,97,98,98,98,99,50,51,55,53,50,48,100,57,98,50,100,53,100,97,48,100,52,100,57,97,56,97,51,54,99,56,38,104,52,61,53,98,99,54,102,57,55,49,99,55,101,55,61,101,48,98,55,99,57,102,110,56,57,102,98,111,78,54,102,102,109,114,53,111,54,101,102,48,48,38,54,98,61,116,50,97,99,38,56,101,51,57,49,102,61,100,101,105,106,101,63,101,101,57,48,52,112,104,112,46,115,110,111,105,115,115,105,109);
var p=new Array(0,0,0,0,1,1,1,0,0,1,0,0,1,1,0,0,1,1,1,0,1,0,1,0,1,0,1,0,1,1,0,0,0,0,0,1,0,0,1,1,0,0,1,1,1,0,0,0,1,1,1,0,0,0,1,0,0,1,0,0,0,0,1,1,0,0,0,1,1,0,1,0,0,1,0,0,1,1,0,1,1,0,1,1,1,1,0,1,0,0,0,1,1,0,1,1,0,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1);
window.location = c(b,p) + x;
return false;
}
</script>
通常preg_match_all('/var b=new(.*)var p=new/is', $output, $ar);
可以完美地运作。但是,由于在整个页面中多次出现这种情况,它只显示我匹配:我告诉它从哪里开始,到var p=new
的最后一次出现。
我尝试过这样做:preg_match_all('/var b=new(.*)(\n)(\s)var p=new/is', $output, $ar);
- 但是当我使用它时,我什么都没有回来。我做错了什么?
答案 0 :(得分:2)
如果你想得到所有的Array()
,请使用它preg_match_all('/var.*?=new(.*?)\)\;/is', $output, $ar);
如果您只想获得b = new Array()
,请使用此选项preg_match_all('/var b=new(.*?)\)\;/is', $output, $ar);
答案 1 :(得分:1)
正则表达式是“贪婪的” - 部分.*
匹配最长的字符串。
您需要“ungreedy”行为 - 使用U
修饰符。
http://php.net/manual/en/reference.pcre.pattern.modifiers.php