使用Domdoc + PHP刮取html

时间:2015-07-08 11:21:25

标签: php domdocument

我想抓取以下HTML

 <div class="venue-event-list " rel="GB">
                            <div class="tracks-list">
<div class="single-track">
            <a href="//livevideo.betfair.com/Default.do?mi=119408124" target="_blank" class="live-video-link"><div class="bf-icon-live-video tag-i13n i13n-ltxt-LVid i13n-sec-GB i13n-tab-today" title="Watch now on Betfair Live Video"></div></a>
    <div class="info-container">
        <span class="track-name">
            <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today" href="/exchange/plus/#/horse-racing/market/1.119408124">Lingfield</a>
        </span>
        <div class="races-list">


<div class="single-race" id="m-1_119408124">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408124"
            title="5f Nursery | 7 Runners">14:10</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408128">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408128"
            title="6f Mdn Stks | 11 Runners">14:40</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408132">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408132"
            title="7f Mdn Stks | 6 Runners">15:10</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408136">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408136"
            title="2m Hcap | 12 Runners">15:40</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408140">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408140"
            title="1m2f Sell Stks | 6 Runners">16:10</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408144">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408144"
            title="1m3f Hcap | 8 Runners">16:40</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408148">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408148"
            title="1m1f Hcap | 14 Runners">17:10</a>
    </span>
</div>
        </div>
    </div>
</div>
                    </div>
                            <div class="tracks-list">
<div class="single-track">
            <a href="//livevideo.betfair.com/Default.do?mi=119408153" target="_blank" class="live-video-link"><div class="bf-icon-live-video tag-i13n i13n-ltxt-LVid i13n-sec-GB i13n-tab-today" title="Watch now on Betfair Live Video"></div></a>
    <div class="info-container">
        <span class="track-name">
            <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today" href="/exchange/plus/#/horse-racing/market/1.119408153">Wolverhampton</a>
        </span>
        <div class="races-list">


<div class="single-race" id="m-1_119408153">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408153"
            title="5f Mdn Stks | 7 Runners">14:20</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408157">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408157"
            title="1m6f Hcap | 7 Runners">14:50</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408161">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408161"
            title="1m4f Sell Stks | 5 Runners">15:20</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408165">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408165"
            title="1m1f Hcap | 13 Runners">15:50</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408169">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408169"
            title="1m1f Hcap | 11 Runners">16:20</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408173">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408173"
            title="1m Mdn Stks | 11 Runners">16:50</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408177">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408177"
            title="1m Hcap | 13 Runners">17:20</a>
    </span>
</div>
        </div>
    </div>
</div>
                    </div>

我使用以下代码来拉取比赛名称和比赛时间

$url         = ""; 
$html        = file_get_contents($url);
$dom         = new DOMDocument();
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$xpath                   = new DOMXPath($dom);
//pull the individual cards for the day
//li class="rac-cardsclass="ix ixc"
$getdropdown             = '//div[contains(@class, "tracks-list")]';
$getdropdown2            = $xpath->query($getdropdown);
//loop through each individual card
foreach ($getdropdown2 as $dropresults) {
echo $dropresults->textContent. "<br />";
}

如果只有链接(如下所示)包含“GB”和“今天”(这是在课堂文本中),我想要的是拉会议名称 -

>  <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today"
> href="/exchange/plus/#/horse-racing/market/1.119408124">Lingfield</a>

所以结果将是lingfield ......如果这是真的,那么我想从下面拉出比赛的时间和市场ID:

<a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
        href="/exchange/plus/#/horse-racing/market/1.119408124"
        title="5f Nursery | 7 Runners">14:10</a>

所以结果将是:

Lingfield 14:10 1.119408124 
Lingfield 14:40 1.119408144
 ............................. 
Wolverhampton 14:20 1.119408153

1 个答案:

答案 0 :(得分:0)

$xpath->query("a[contains(@class,'GB') and contains(@class,'today')]");

会有所帮助。