Question

我尝试复制webpage

中的句子

我的代码是：

$request_url ='https://stackoverflow.com/questions/391005/convert-html-css-to-pdf-with-php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $request_url);    // The url to get links from
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the respone
$result = curl_exec($ch);
$regex='/<h1 itemprop="name">(.*)<\/h1>/i';
preg_match_all($regex,$result,$parts);
$links=$parts[1];
foreach($links as $link){
    echo $link."<br>";
}
curl_close($ch);

这是有效的，但当我尝试更改第6行时，它不起作用

$regex='/itemprop="name">(.*)<\/h1>/i';

我要复制的网站脚本是：

<h1 itemprop="name">
<a class="question-hyperlink" href="/questions/391005/convert-html-css-to-pdf-with-php">Convert HTML + CSS to PDF with PHP?</a></h1>

我想打印＆＃34;用PHP将HTML + CSS转换为PDF？＆＃34;请告诉我如何从这个锚标签中复制并打印该句子。

Answer 1

或者，您也可以将DOMDocument与DOMXpath一起使用。考虑这个例子：

$request_url ='http://stackoverflow.com/questions/391005/convert-html-css-to-pdf-with-php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $request_url); // The url to get links from
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the response
libxml_use_internal_errors(true);
$result = curl_exec($ch);
$dom = new DOMDocument();
$dom->loadHTML($result);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
// target the title
$title = $xpath->query('//div[@id="question-header"]/h1[@itemprop="name"]/a[@class="question-hyperlink"]')->item(0)->nodeValue;
echo $title; // Convert HTML + CSS to PDF with PHP?

旁注：这是最奇怪的刮痧问题，令人伤痕累累。

Answer 2

您需要修改正则表达式以将输入解析为单行。更确切地说，你需要告诉你正则表达式与换行符不匹配，因为换行符不是.

的一部分

此can be done将 s 添加到该行末尾的 i ：

s（PCRE_DOTALL）如果设置了此修饰符，则模式中的点元字符将匹配所有字符，包括换行符。没有它，排除了换行符。此修饰符等效于Perl的/ s修饰符。诸如[^ a]之类的否定类始终匹配换行符，与此修饰符的设置无关。

你的正则表达式如下：

/itemprop="name">(.*)<\/h1>/is

现在，您需要做的就是将文本放入其他标签中，以便摆脱它们。此刻，你采取了h1标签的内部。请注意在a-tag之前处理换行符：

/itemprop="name">.*<a.*>(.*)<\/a><\/h1>/is

会做到这一点！

cURL没有处理标签

2 个答案: