我找到了一个用于从网页抓取项目标题的php函数,它使用的reg exp是/<div class=\"detail\">(.*?)<p>/si
,如下面的代码所示:我知道/<div class=\"detail\">
正在尝试匹配特定div,(.*?)<p>
匹配该div之后和<p>
之前没有贪心的任何字符,但/si
是什么意思?谢谢!
<?php
// Get the title
function match_title( $content ) {
preg_match( '/<div class=\"detail\">(.*?)<p>/si', $content, $result );
isset( $result ) ? $title = trim( addslashes( $result[1] ) ) : $title = '';
return $title;
}
$url = "http://a.m.taobao.com/i21708516412.htm";
$item = file_get_contents($url);
$title=match_title( $item );
?>
答案 0 :(得分:3)
请参阅此处了解所有修饰符:http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
i (PCRE_CASELESS)
If this modifier is set, letters in the pattern match both upper and lower case letters.
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.
总结一下:新线是匹配的,表达式是无壳的。
答案 1 :(得分:0)