preg_match_all意外匹配

时间:2013-06-21 07:36:07

标签: php scrape

我刚刚开始使用PHP,我想抓一个我不能的页面,我试过'PREG_MATCH_ALL',但它只是没有得到我想要的结果..基本上我想刮掉youtube视频仅限于此处的链接:https://gdata.youtube.com/feeds/api/standardfeeds/most_shared - 抓取所有这些链接,然后再使用它们。

我尝试使用以下失败的代码;

<?php
    $data = file_get_contents('https://gdata.youtube.com/feeds/api/standardfeeds/most_shared');
    preg_match_all("/src='(.+?)'>/", $data, $links);
    $link_out = $links[0][0];
    echo $link_out;
?>

我是PHP的新手,请帮忙。

由于

3 个答案:

答案 0 :(得分:2)

由于Feed是XML,您可以使用PHP的SimpleXMLElement来获取数据。

<?php
$xml = new SimpleXMLElement(
    'https://gdata.youtube.com/feeds/api/standardfeeds/most_shared',
    null,
    true
);

foreach($xml->entry as $entry) {
    echo $entry->content['src'], PHP_EOL;
}

/*
    https://www.youtube.com/v/IjWc43FCYlg?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/Xw1C5T-fH2Y?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/Kq0_dGKx4Os?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/gbcBYs0ljI0?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/78juOpTM3tE?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/OOiZ-5DqwYI?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/zjz614QVyfQ?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/h15m87WsCHQ?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/SXKOTdyOUBg?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/BRAM8MpqIeA?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/5yB3n9fu-rM?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/NAOo9SnzRH8?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/0KtILkzC-1g?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/kWSIFh8ICaA?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/Mi6AhogZCeg?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/kWuIGAZ1x2I?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/lKY5fmDGVLs?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/C94PaCtqOk4?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/V-fL8zopddI?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/UWlzMIl7E48?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/mcw6j-QWGMo?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/-RSDaRttpzk?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/8_RDx4skTp4?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/7YDWdv9kR0M?version=3&f=standard&app=youtube_gdata
    https://www.youtube.com/v/m96tYpEk1Ao?version=3&f=standard&app=youtube_gdata
*/

安东尼。

答案 1 :(得分:1)

尝试使用此preg_match:

preg_match_all("/src='([^']+)'/si", $data, $links);

并显示结果:

echo "<pre>";
print_r($links);

答案 2 :(得分:1)

<?php
$data = file_get_contents('https://gdata.youtube.com/feeds/api/standardfeeds/most_shared');
preg_match_all("/src='(.+?)'\/>/", $data, $links);
print_r($links[1]);

您忘记匹配关闭/锚标记。