Question

我正在尝试创建一个脚本，从网站表中提取文本并通过php显示它。当我在这个地址上运行时：

http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/01/1150001435/1&judet=50

结果是空的。代码有问题吗？我该如何修复/改进它？

<?php
include_once('simple_html_dom.php');
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
// Start a cURL resource
$ch = curl_init();
// Set options for the cURL
curl_setopt($ch, CURLOPT_URL, 'http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/01/1150001435/1&judet=50'); // target
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); // provide a user-agent
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow any redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // return the result
// Execute the cURL fetch
$result = curl_exec($ch);
// Close the resource
curl_close($ch);
// Output the results
echo $result;
function scraping() {
    // create HTML DOM
    $html = file_get_html('http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/01/1150001435/1&judet=50');

    // get article block
    if($html && is_object($html) && isset($html->nodes)){
    foreach($html->find('/html/body/table') as $article) {
        // get title
        $item['titlu'] = trim($article->find('/html/body/table/tbody/tr[1]/td/div', 0)->plaintext);
        // get body
        $item['tr2'] = trim($article->find('/html/body/table/tbody/tr[2]', 0)->plaintext);
        $item['tr3'] = trim($article->find('/html/body/table/tbody/tr[3]', 0)->plaintext);
        $item['tr4'] = trim($article->find('/html/body/table/tbody/tr[4]', 0)->plaintext);
        $item['tr5'] = trim($article->find('/html/body/table/tbody/tr[5]', 0)->plaintext);
        $item['tr6'] = trim($article->find('/html/body/table/tbody/tr[6]', 0)->plaintext);
        $item['tr7'] = trim($article->find('/html/body/table/tbody/tr[7]', 0)->plaintext);
        $item['tr8'] = trim($article->find('/html/body/table/tbody/tr[8]', 0)->plaintext);
        $item['tr9'] = trim($article->find('/html/body/table/tbody/tr[9]', 0)->plaintext);
        $item['tr10'] = trim($article->find('/html/body/table/tbody/tr[10]', 0)->plaintext);
        $item['tr11'] = trim($article->find('/html/body/table/tbody/tr[11]', 0)->plaintext);
         $item['tr12'] = trim($article->find('/html/body/table/tbody/tr[12]', 0)->plaintext);
        $ret[] = $item;
    }

    // clean up memory
    $html->clear();
    unset($html);

    return $ret;}
}

// -----------------------------------------------------------------------------
// test it!
$ret = scraping();

foreach($ret as $v) {
    echo $v['titlu'].'<br>';
    echo '<ul>';
    echo '<li>'.$v['tr2'].'</li>';
    echo '<li>'.$v['tr3'].'</li>';
    echo '<li>'.$v['tr4'].'</li>';
    echo '<li>'.$v['tr5'].'</li>';
    echo '<li>'.$v['tr6'].'</li>';
    echo '<li>'.$v['tr7'].'</li>';
    echo '<li>'.$v['tr8'].'</li>';
    echo '<li>'.$v['tr9'].'</li>';
    echo '<li>'.$v['tr10'].'</li>';
    echo '<li>'.$v['tr11'].'</li>';
    echo '<li>'.$v['tr12'].'</li>';
    echo '</ul>';
}
?>

Answer 1

因为在foreach中你使用查找/html/body/table的结果你不应该使用完整路径但是要求：

$item['titlu'] = trim($article->find('/tbody/tr[1]/td/div', 0)->plaintext); $item['tr2'] = trim($article->find('/tbody/tr[2]', 0)->plaintext);

依旧......

对于你的卷曲工作，你需要移动

$ch = curl_init();

在第一个curl_setopt之前

HTML Dom代码未运行

1 个答案: