我不知道如何继续从表中获取字符串。
这是html td:
<tr>
<td class="">1</td><td><a
href="http://www.canalturf.com/courses_fiche_cheval.php?
idcheval=227579&idcourse=173937" target="_blank" title="La fiche du
cheval EQUEMAUVILLE "><strong>EQUEMAUVILLE (H4)</strong><br>
<small>4s3s1s6s5h(17)2h2h3h7h</small><div class="pedigree hidden">
<small>SAINT DES SAINTS - MISS ACADEMY</small></div></a>
</td>
<td><div style="width:30px; height:35px; overflow:hidden"><img
src="http://www.canalturf.com/interface/casaques/173937.png"
style="width:100%; position:relative; top:0px"></div>
</td>
<td><a href="http://www.canalturf.com/courses_fiche_jockey.php?
idjockey=3080&date=2018-08-04" target="_blank" title="La fiche du
jockey/driver D. GALLON"><strong>D. GALLON</strong></a><br><a
href="http://www.canalturf.com/courses_fiche_entraineur.php?
identraineur=171&date=2018-08-04" target="_blank" title="La fiche de
l'entraineur F.NICOLLE"><small>F.NICOLLE</small></a></td><td>71.0 kg
</td>
<td
class="text-center bord-lft">9</td><td class="text-center bord-lft text-
success">-40%</td><td class="text-center bord-lft"><a
href="https://eule1.pmu.fr/dynclick/pmu/?eaf-
publisher=ACQHIPPIQUECANALTURF_CANALTURF&eaf-
name=ACQHIPPIQUECANALTURF_CANALTURF_2010_WEB_AFF_FILROUGE&eaf-
creative=ACQ_H_DESKTOP_ETIRELIRE_BANNIERE&eaf-
creativetype=BANNIERE&eseg-name=ia-affilie&eseg-
item=a_Canalturfb_TEXTEc_aid&
mediaplan=2010_WEB_AFF_FILROUGE&eurl=https%3A%2F%2Fwww.Fturf%2Fouver
ture-compte%2Fstandard%2F%3F2%26hippique-
tirelire%26ns_mchannel%3DAFF%26ns_source%3DACQHIPPIQUECANALTURF_CANALTURF"
target="_blank" onclick="handleOutboundLinkClicks('PMUClic', 'pmuCote',
'1');">5.4</a></td><td class="text-center bord-lft"><a
href="https://www.zeturf.fr/fr/inscription?
pid=88&affutm_source=Affiliation&utm_medium=Canalturf&u
tm_campaign=ZT_FR_Affiliation_Filrouge_Logo_2018" target="_blank"
onclick="handleOutboundLinkClicks('ZTClic', 'ztCote', '1');">5</a></td><td
class="text-center bord-lft"><a
href="http://wlbetclicfr.adsrv.eacdn.com/C.ashx?
btag=a_920b_260c_&affid=590&siteid=920&adid=260&c=turf"
target="_blank" onclick="handleOutboundLinkClicks('BTClic', 'btCote',
'1');">--
</a>
</td>
<td class="text-center bord-lft"><a
href="http://media.unibet.fr/redirect.aspx?
pid=32884&bid=2223" target="_blank"
onclick="handleOutboundLinkClicks('UNClic', 'unCote', '1');">4.8</a>
</td>
这是我得到的值:
1,EQUEMAUVILLE (H4)4s3s1s6s5h(17)2h2h3h7hSAINT DES SAINTS - MISS ACADEMY,AP,F. OUVRIEJ. FOIN,2700m,47,+28%,60,67.6,67.78,--
这就是我想要的:
1,EQUEMAUVILLE,H4,4s3s1s6s5h(17)2h2h3h7h,SAINT DES SAINTS,MISS ACADEMY,AP,F. OUVRIEJ. FOIN,2700m,47,+28%,60,67.6,67.78,--
我得到这样的值:
table = soup2.find("table", attrs={"id":"TablePartants"})
headers = [th.text for th in table.select("tr th")]
with open("out.csv", "w") as f:
wr = csv.writer(f,lineterminator = '\n')
#wr.writerow(headers)
wr.writerows([[td.text for td in row.find_all("td")] for row in table.select("tr")])
我必须使用正则表达式吗?还是可以在使用特殊的html标记之前解析它们? 我有点困惑。
如果有人可以帮助我,我将非常感激。 非常感谢。