Question

我正在尝试从以下网址保存歌曲标题：https://onlineradiobox.com/us/977todayshits/playlist

我使用下面的代码来获取数据

$html = file_get_contents("https://onlineradiobox.com/us/977todayshits/playlist");
    $matches = array();
    $output = preg_match_all('/<table class="tablelist-schedule" role="log">(.*?)<\/table>/s', $html, $matches,PREG_SET_ORDER );
    echo "<pre>";
    print_r($matches);
    echo "</pre>";

上述代码的结果：

Live    Mark Ronson - Nothing Breaks Like a Heart (feat. Miley Cyrus)
10:41   Camila Cabello - Consequences
10:38   Imagine Dragons - It's Time
10:34   Panic! at the Disco - High Hopes
10:31   Selena Gomez - Hands to Myself

此代码获取数据，但是我不知道如何将第二个td标签值保存在表中。没有链接时，第二个td不一定是链接，那么td标签中没有定义类。

Answer 1

正如@Denis V所说的，不要使用RegEx解析html / xml内容，为此使用适当的库，例如LibXML ...

示例：

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML(file_get_contents('https://onlineradiobox.com/us/977todayshits/playlist'));

$xPath = new DOMXPath($dom);

$nodes = $xPath->query('//table[@class="tablelist-schedule"]/tbody/tr/td[2]');

foreach ($nodes as $node) {
    echo $node->textContent . "\n";
}

打印...

比特犬-我们一生的时光（壮举。NeYo）

Ellie Goulding-亲近我（x Diplo专长。Swae Lee）

Post Malone和Swae Lee-向日葵

我们为什么不-8个字母

NF-谎言

...列表过长...

使用PHP从网站保存HTML标记数据

1 个答案: