未知文本的字符组合

时间:2015-09-15 12:57:27

标签: php html

我有一个刮擦表格的功能。表格的<td>有很多值,当然这些值是不同的。因此,我需要一个字符组合左右,所以无论价值如何,我都可以获得所有内容。

function scrape_between($data, $start, $end){
    $data = stristr($data, $start); // Stripping all data from before $start
    $data = substr($data, strlen($start));  // Stripping $start
    $stop = stripos($data, $end);   // Getting the position of the $end of the data to scrape
    $data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
    return $data;   // Returning the scraped data from the function
}

$match = $this -> scrape_between($array, '<td (__MAYBE SOME CHARACTER TO GET EVERYTHING NO MATTER WHAT__) class="V1_c01">', "</td>");

编辑:我想做一个foreach,因为桌子有不同的ID,我想在国外搜索这些。

    foreach ($separate_results as $key => $separate_result) {
        if ($separate_result != "") {
            $table[$key][0]= $this -> scrape_between($separate_result, '<td id="indhold_0_indholdbredvenstre_0_integrationwrapper_1_ctl01_Program_ProgramNormal_Program1_c04_0" class="V1_c04">', "</td>");
        }
    }

1 个答案:

答案 0 :(得分:1)

如果您担心__SOMETHING HERE__并且类名V2_c01是否已修复,则以下是我的POC

<?php
function scrape_between($data, $classname, $tagname){

    // get anything between `<td` and `classname` whereas `<td` must be the first occurence to the left of `classname`
    $openstart = stristr(strrev($data), strrev($classname));
    $openstart = substr($openstart, strlen($classname));
    $openstart = substr($openstart, 0, stripos($openstart, '<'.$tagname));
    $openstart = strrev($openstart);

    // get anything between `classname` and `>` whereas `>` must be the first occurence to the right of `classname`
    $openend = stristr($data, $classname);
    $openend = substr($openend, strlen($classname));
    $openend = substr($openend, 0, stripos($openend, '>')+1);

    $start = $openstart.$classname.$openend; // '<td __SOMETHING HERE__ class="' . 'V1_c01' . '">'
    $end = "</".$tagname.">";

    $data = stristr($data, $start); // Stripping all data from before $start
    $data = substr($data, strlen($start));  // Stripping $start
    $stop = stripos($data, $end);   // Getting the position of the $end of the data to scrape
    $data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
    return $data;   // Returning the scraped data from the function
}

$array = '<table><tr><td>&nbsp;</td></tr><tr><td style="" id="td1"><table><tr><td style="" class="V1_c01" id="mytd">my td content</td></tr></table></td></tr><tr><td>&nbsp;</td></tr></table>';
$match = scrape_between($array, 'V1_c01', "td");

echo $match;
echo '<br />';

$array = '<table><tr><td>&nbsp;</td></tr><tr><td style="" id="td1"><table><tr><td><span style="" class="V1_c01" id="myspan">my span content</span></td></tr></table></td></tr><tr><td>&nbsp;</td></tr></table>';
$match = scrape_between($array, 'V1_c01', "span");

echo $match;

?>

结果一:

my td content

结果二:

my span content