从外部源获取多个DIV

时间:2015-02-27 11:16:25

标签: php html curl

我正在尝试在自己的页面上显示外部页面中的多个div。 我有以下代码来提取div。这是以通用的方式工作,但我希望它更具动态性。

此代码从给定的div ID中提取内容并将其显示在我自己的页面上。

        <?php   
header("Content-Type: text/html; charset=utf-8");

function file_get_contents_curl($url) {
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

    $data = curl_exec($ch);
    curl_close($ch);

    return $data;
}

//The URL for the external content we want to pull
$html = file_get_contents_curl("https://www.page.com/subdir/");

//parsing all content:
$doc = new DOMDocument();
@$doc->loadHTML($html);

$content = $html;

//The div that includes the content '<div id="divid">'
$first_step = explode( '<div id="ide">' , $content );
$second_step = explode("</div>" , $first_step[1] );

//Do some magic with the URL
$url2 = $second_step[0];
$url3 = $second_step[8];
$url4 = $second_step[16];

$patterns = array(
    '#\./opening;jsessionid=.*\?#',
    '#<a href=#',
    '#span(.*?)>#'
);

$replaces = array(
    'https://www.page.com/subdir/opening?',
    '<a target="_blank" href=',
    'h1>'
);

//Print the final output
///Merge the result into one variable
$final_output = 
        preg_replace($patterns, $replaces, $url2) . 
        $second_step[1] . /* Description -- NOTE: By commenting out this you need to change the H1 margin in the style declaration */
        $second_step[2] . /* From date */
        $second_step[3] . /* To date */
        $second_step[4] . /* Company */
        $second_step[5] . /* Employment condition (full-time/part-time) */
        $second_step[6] . /* Department */
        //$second_step[7] . 
        '<hr>' . /* Horizontal rule */
        preg_replace($patterns, $replaces, $url3) . 
        $second_step[9] . /* Description -- NOTE: By commenting out this you need to change the H1 margin in the style declaration */
        $second_step[10] . /* From date */
        $second_step[11] . /* To date */
        $second_step[12] . /* Company */
        $second_step[13] . /* Employment condition (full-time/part-time) */
        $second_step[14] . /* Department */
        //$second_step[15] . 
        '<hr>' . /* Horizontal rule */
        preg_replace($patterns, $replaces, $url4) . 
        $second_step[17] . /* Description -- NOTE: By commenting out this you need to change the H1 margin in the style declaration */
        $second_step[18] . /* From date */
        $second_step[19] . /* To date */
        $second_step[20] . /* Company */
        $second_step[21] . /* Employment condition (full-time/part-time) */
        $second_step[22] . /* Department */
        $second_step[22] . 
        '<hr>'; /* Horizontal rule */

///Convert special chars
$converted = iconv("UTF-8", "UTF-8//TRANSLIT", $final_output);

///Display the final result
echo $converted;
?>

在此代码中,我url2url3url4定义了以后使用preg_replace修改的提取内容的某些部分。我也有监听$second_step[xx]来定义要显示的内容。

截至目前,我需要列出$second_step[xx]和多个urlxx的多个“块”,以便能够使用divid显示父div的所有子div。 child-div没有任何ID或类。

我不知道有多少DIV可以随时显示,所以我必须在我的代码中列出很多这些语句。当我尝试用<hr>分隔每个div时,如果只有一个或者没有要显示的DIV,我会在页面底部获得很多水平规则。

我也希望能够在列中显示div,比如两个div并排显示。

我该如何解决这个问题?

编辑:以下是我正在尝试使用的原始数据示例。

<div id="ide">
    <div>
        <div class="openingTitle"><a href="./opening;jsessionid=E2A19018E967B4771224A9FA515AFBC0?0-1.ILinkListener-content-contentPanel-openings~view~container-openings~view-0-details"><span style="font-weight:bold;">fagarbeider</span></a></div>
        <div class="openingIngress"><p>Avdeling teknisk drift har ledig stilling som fagarbeider vann/avløp.<br/>Fast, 100 %, ledig snarest.</p></div>
        <div class="openingDetail"><i>Utlyst:&nbsp;<span>28.01.2015</span></i></div>
        <div class="openingDetail"><i>Søknadsfrist:&nbsp;<span style="color:red">01.03.2015</span></i></div>
        <div class="openingDetail"><i>Selskap:&nbsp;<span>Randaberg kommune</span></i></div>
        <div class="openingDetail"><i>Stillingstype:&nbsp;<span>Fast ansatt</span></i></div>
        <div class="openingDetail"><i>Lokasjon:&nbsp;<span>Avd. teknisk drift</span></i></div>
        <div class="openingDetail">

        </div>
        <div>
        <div class="openingTitle"><a href="./opening;jsessionid=E2A19018E967B4771224A9FA515AFBC0?0-1.ILinkListener-content-contentPanel-openings~view~container-openings~view-0-details"><span style="font-weight:bold;">fagarbeider2</span></a></div>
        <div class="openingIngress"><p>Avdeling teknisk drift har ledig stilling som fagarbeider vann/avløp.<br/>Fast, 100 %, ledig snarest.</p></div>
        <div class="openingDetail"><i>Utlyst:&nbsp;<span>28.01.2015</span></i></div>
        <div class="openingDetail"><i>Søknadsfrist:&nbsp;<span style="color:red">01.03.2015</span></i></div>
        <div class="openingDetail"><i>Selskap:&nbsp;<span>Randaberg kommune</span></i></div>
        <div class="openingDetail"><i>Stillingstype:&nbsp;<span>Fast ansatt</span></i></div>
        <div class="openingDetail"><i>Lokasjon:&nbsp;<span>Avd. teknisk drift</span></i></div>
        <div class="openingDetail">

        </div>
    </div>

谢谢!

1 个答案:

答案 0 :(得分:0)

您可以尝试使用正则表达式提取它:

preg_match_all('{<div[^<>]*[^<>]*>(?<content>.*?)</div>}', $content, $matches);
$array_of_contents = $matches[0];

现在$ array_of_contents是一个包含这些div中所有内容的数组。当然它只涉及那些内部的div,它们必须在一个层面上。