Beautifulsoup用单个col替换colspan = 2

时间:2014-07-31 08:04:08

标签: beautifulsoup

我正在尝试从行中解析数据,这些行偶尔会有一个colspan = 2,这会破坏我提取目标数据的能力。 我想做的是每次发生时从表元素中删除'colspan = 2':

#replace
<td colspan="2" class="time">10:00 AM</td>
#with
<td>635</td>

这可能吗?如果是那么我可以将其用于条件吗?

这是一个更详细的例子:

<table>
<tr class="playerRow even">
<td class="pos">1</td>
<td><span class="rank"></span> -</td>
<td class="player"><p class="playerName">John doe</p></td>
<td class="background">X</td>
<td>345</td> #THIS ELEMENT FREQUENT
<td></td>
<td></td>
<td></td>
<td></td>
<td style=""></td>
</tr><

<tr class="playerRow odd">
<td class="pos">1</td>
<td><span class="rank"></span> -</td>
<td class="player"><p class="playerName">John doe</p></td>
<td class="background">X</td>
<td colspan="2" class="myClass" style="">3:15 PM</td> #THIS ELEMENT OCCASIONAL
<td></td>
<td></td>
<td></td>
<td></td>
<td style=""></td>
</tr>

<tr class="playerRow odd">
<td class="pos">1</td>
<td><span class="rank"></span> -</td>
<td class="player"><p class="playerName">John doe</p></td>
<td class="background">X</td>
<td>22</td> #THIS ELEMENT FREQUENT
<td></td>
<td></td>
<td></td>
<td></td>
<td style=""></td>
</tr>
</table>

因此,每当我遇到colspan时,我都想用普通的td替换它,所以它不会将行元素分流并弄乱我的数量。

1 个答案:

答案 0 :(得分:4)

这将转换:

<td colspan="2" class="myClass" style="">3:15 PM</td>

为:

<td>3:15 PM</td>

from bs4 import BeautifulSoup

bs = BeautifulSoup(html)

for x in bs.findAll("td"):
    if "colspan" in x.attrs:
        x.attrs = {}

您是否希望它也删除该值?