我想在使用DomCrawler / Goutte Symfony组件的.pertanyaan
类之前只提取每个.listjawaban
类中的段落
有没有办法做到这一点?
我和$crawler->filter('.pertanyaan p')->eq($i)->html()
一起来了,但它只给了我第一段,因为$i
是.pertanyaan
类的第n个位置。
<div class="pertanyaan"><p></p>
<p>Karena mengalami mutasi, kromosom mengalami perubahan seperti pada gambar di bawah.</p>
<p><img src="http://indocademy.com/images/ipa_2013_133/53_1.png" alt=""><br>Jenis mutasi tersebut adalah ....</p>
<p></p>
<div class="listjawaban">
<div class="radiojawaban">
<input type="radio" name="answer_dup_758" id="answer_dup_758_A" value="A" style="display:none" disabled=""><input type="radio" name="answer_758" id="answer_758_A" value="A" onclick="showbutton(758);">A.
</div>
<div class="pilihanjawaban">
adisi
</div>
</div>
<div class="listjawaban">
<div class="radiojawaban">
<input type="radio" name="answer_dup_758" id="answer_dup_758_B" value="B" style="display:none" disabled=""><input type="radio" name="answer_758" id="answer_758_B" value="B" onclick="showbutton(758);">B.
</div>
<div class="pilihanjawaban">
delesi
</div>
</div>
<div class="listjawaban">
<div class="radiojawaban">
<input type="radio" name="answer_dup_758" id="answer_dup_758_C" value="C" style="display:none" disabled=""><input type="radio" name="answer_758" id="answer_758_C" value="C" onclick="showbutton(758);">C.
</div>
<div class="pilihanjawaban">
inversi
</div>
</div>
<div class="listjawaban">
<div class="radiojawaban">
<input type="radio" name="answer_dup_758" id="answer_dup_758_D" value="D" style="display:none" disabled=""><input type="radio" name="answer_758" id="answer_758_D" value="D" onclick="showbutton(758);">D.
</div>
<div class="pilihanjawaban">
duplikasi
</div>
</div>
<div class="listjawaban">
<div class="radiojawaban">
<input type="radio" name="answer_dup_758" id="answer_dup_758_E" value="E" style="display:none" disabled=""><input type="radio" name="answer_758" id="answer_758_E" value="E" onclick="showbutton(758);">E.
</div>
<div class="pilihanjawaban">
translokasi
</div>
</div>
<div class="buttons">
<input type="button" class="tombol_jawab" id="tombol_jawab_758" value="Jawab" style="display:none" onclick="executejawaban(758,"http://indocademy.com")"><input type="button" class="tombol_clear" id="tombol_clear_758" value="Hapus" style="display:none" onclick="clearjawaban(758)">
</div>
<div class="kunci" id="kunci_758" style="display: none">
<div class="tulisanjawab abu">
<input type="button" id="tombol_kunci" value="+" class="jawaban_758" onclick="showkunci(this)">
Jawaban : <img id="loading_758" src="http://indocademy.com/images/loading.gif" style="height:12px;vertical-align:middle">
<span id="hasil_758"> </span>
</div>
<div class="konten_kunci">
<div class="konten_jawaban_758" id="isi_jawaban"></div>
</div>
</div>
</div>
&#13;
这是我要抓取的网址:http://indocademy.com/soal/sbmptn/biologi/2013
一切顺利,除了爬行时,但是在#53号,因为有三个段落标记要提取(我只假设每个数字的第一个段落标记都是问题,而且我不知道如何在{{}之前提取所有段落{1}}类)
请帮忙
答案 0 :(得分:1)
由于URL上的页面没有结构,并且类别.pertanyaan不存在,我将HTML片段复制到脚本中并使用DomCrawler获取四个
元素
#!/usr/bin/php
<?php
require ('vendor/autoload.php');
use Symfony\Component\DomCrawler\Crawler;
$html = <<<'HTML'
<div class="pertanyaan">
<p></p>
<p>Karena mengalami mutasi, kromosom mengalami perubahan seperti pada gambar di bawah.</p>
<p><img src="http://indocademy.com/images/ipa_2013_133/53_1.png" alt=""><br>Jenis mutasi tersebut adalah ....</p>
<p></p>
<div class="listjawaban">
<div class="radiojawaban">
<input type="radio" name="answer_dup_758" id="answer_dup_758_A" value="A" style="display:none" disabled="">
<input type="radio" name="answer_758" id="answer_758_A" value="A" onclick="showbutton(758);">A.
</div>
<div class="pilihanjawaban">
adisi
</div>
</div>
<div class="listjawaban">
<div class="radiojawaban">
<input type="radio" name="answer_dup_758" id="answer_dup_758_B" value="B" style="display:none" disabled="">
<input type="radio" name="answer_758" id="answer_758_B" value="B" onclick="showbutton(758);">B.
</div>
<div class="pilihanjawaban">
delesi
</div>
</div>
<div class="listjawaban">
<div class="radiojawaban">
<input type="radio" name="answer_dup_758" id="answer_dup_758_C" value="C" style="display:none" disabled="">
<input type="radio" name="answer_758" id="answer_758_C" value="C" onclick="showbutton(758);">C.
</div>
<div class="pilihanjawaban">
inversi
</div>
</div>
<div class="listjawaban">
<div class="radiojawaban">
<input type="radio" name="answer_dup_758" id="answer_dup_758_D" value="D" style="display:none" disabled="">
<input type="radio" name="answer_758" id="answer_758_D" value="D" onclick="showbutton(758);">D.
</div>
<div class="pilihanjawaban">
duplikasi
</div>
</div>
<div class="listjawaban">
<div class="radiojawaban">
<input type="radio" name="answer_dup_758" id="answer_dup_758_E" value="E" style="display:none" disabled="">
<input type="radio" name="answer_758" id="answer_758_E" value="E" onclick="showbutton(758);">E.
</div>
<div class="pilihanjawaban">
translokasi
</div>
</div>
<div class="buttons">
<input type="button" class="tombol_jawab" id="tombol_jawab_758" value="Jawab" style="display:none" onclick="executejawaban(758,"http://indocademy.com")"><input type="button" class="tombol_clear" id="tombol_clear_758" value="Hapus" style="display:none"
onclick="clearjawaban(758)">
</div>
<div class="kunci" id="kunci_758" style="display: none">
<div class="tulisanjawab abu">
<input type="button" id="tombol_kunci" value="+" class="jawaban_758" onclick="showkunci(this)"> Jawaban : <img id="loading_758" src="http://indocademy.com/images/loading.gif" style="height:12px;vertical-align:middle">
<span id="hasil_758"> </span>
</div>
<div class="konten_kunci">
<div class="konten_jawaban_758" id="isi_jawaban"></div>
</div>
</div>
</div>
HTML;
$crawler = new Crawler($html);
$output = $crawler->filter('.pertanyaan p')->each(function ($node) {
return $node->html();
});
print_r($output);
函数each()
返回四个段落的数组。结果数组在这里:
Array
(
[0] =>
[1] => Karena mengalami mutasi, kromosom mengalami perubahan seperti pada gambar di bawah.
[2] => <img src="http://indocademy.com/images/ipa_2013_133/53_1.png" alt=""><br>Jenis mutasi tersebut adalah ....
[3] =>
)