如何使用dt标签抓取数据

时间:2017-11-15 13:45:14

标签: php

我正在从网址抓取数据,主要是我经历过ul li等等。

这次我找到了dl个标签,当我使用scrape_between函数时,它并没有向我展示我的代码:

<div id='gallery-1' class='gallery galleryid-273 gallery-columns-2 gallery-size-full'><dl class='gallery-item'>
        <dt class='gallery-icon portrait'>
            <a href='https://example.com/wp-content/uploads/2013/11/gf-1.jpg?fit=650%2C976' data-rel="lightbox-gallery-1"><img  width="650" height="976"  src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-src="https://example.com/wp-content/uploads/2013/11/gf-1.jpg?fit=650%2C976"  class="attachment-full size-full" alt="" aria-describedby="gallery-1-16311" data-srcset="https://example.com/wp-content/uploads/2013/11/gf-1.jpg?w=650 650w, https://example.com/wp-content/uploads/2013/11/gf-1.jpg?resize=200%2C300 200w" data-sizes="(max-width: 650px) 100vw, 650px" /></a>
        </dt>
            <dd class='wp-caption-text gallery-caption' id='gallery-1-16311'>
            Ground Floor Plan
            </dd></dl><dl class='gallery-item'>
        <dt class='gallery-icon portrait'>
            <a href='https://example.com/wp-content/uploads/2013/11/ff.jpg?fit=649%2C1024' data-rel="lightbox-gallery-1"><img  width="649" height="1024"  src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-src="https://example.com/wp-content/uploads/2013/11/ff.jpg?fit=649%2C1024"  class="attachment-full size-full" alt="" aria-describedby="gallery-1-16312" data-srcset="https://example.com/wp-content/uploads/2013/11/ff.jpg?w=649 649w, https://example.com/wp-content/uploads/2013/11/ff.jpg?resize=190%2C300 190w" data-sizes="(max-width: 649px) 100vw, 649px" /></a>
        </dt>
            <dd class='wp-caption-text gallery-caption' id='gallery-1-16312'>
            First Floor pLan
            </dd></dl><br style="clear: both" />
    </div>

有人可以帮帮我吗?

scrap_between函数

function scrape_between($data, $start, $end){
    $data = stristr($data, $start); 
    $data = substr($data, strlen($start));  
    $stop = stripos($data, $end);   
    $data = substr($data, 0, $stop);    
    return $data;   
}

我需要抓取dt标签中的图像

我正在尝试此代码

$project_images = scrape_between($data, '<dl class="gallery-item', '<br style="clear: both">');

请建议

2 个答案:

答案 0 :(得分:0)

最后我自己得到了解决方案我没有在这里找到任何问题的答案,所以我决定回答这个以帮助其他人

我解决了使用此代码获取dl图像的问题

from atlassian import Confluence

confluence = Confluence(
    url='http://localhost:8090',
    username='admin',
    password='admin')

status = confluence.create_page(
    space='DEMO',
    title='This is the title',
    body='This is the body. You can use <strong>HTML tags</strong>!')

print(status)

答案 1 :(得分:-1)

您可以在函数中使用循环并使其返回数组:

//returns array with found elements
function scrape_between($data, $start, $end) {

    $html_array = explode($start, $data);

    $clean_html_arr = [];

    foreach ($html_array as $position => $html_array_element) {
      if ($position > 0) {
        $html_exploded = explode($end, $html_array_element);
        $clean_html_arr[] = $start . $html_exploded[0] . $end;
      }
    }

    return $clean_html_arr;
}

并像这样使用它:

//Test function
foreach(scrape_between($str, "<dl class='gallery-item'>", "</dl>") as $key => $htmlBlock) {
    echo htmlspecialchars($htmlBlock) . '<br><br>';
}