Question

这是我抓取所有pdf链接的代码，但它不起作用。如何从这些链接下载并保存到我的计算机上的文件夹中？

<?php
set_time_limit(0);
include 'simple_html_dom.php';

$url = 'http://example.com';
$html = file_get_html($url) or die ('invalid url');

//extrack pdf links
foreach($html->find('a[href=[^"]*\.pdf]') as $element)
echo $element->href.'<br>';
?>

Answer 1

foreach($htnl->find('a[href=[^"]*\.pdf]') as element)
           ^---typo. should be an 'm'        ^---typo. need a $ here

除了上述错字之外，您的代码“如何”无效？

Answer 2

你有没有调查过phpquery？ http://code.google.com/p/phpquery/

Answer 3

这里更简单的解决方案是：

foreach ($html->find('a[href$=pdf]') as $element)

https://simplehtmldom.sourceforge.io/manual.htm

<块引用>

[attribute$=value] 匹配具有指定属性的元素并以某个值结束。

如何从html链接爬行和下载所有pdf文件？

3 个答案: