我正在尝试从具有特定ID的表中获取数据,我知道。 出于某种原因,代码一直给我一个无结果。
从我试图解析的HTML代码:
<table cellspacing="0" cellpadding="3" border="0" id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1" style="width:100%;border-collapse:collapse;">
<tr class="gridHeader" valign="top">
<td class="titleGridRegNoB" align="center" valign="top"><span dir=RTL>שווי שוק (אלפי ש"ח)</span></td>
<td class="titleGridReg" align="center" valign="top">הון רשום למסחר</td>
<td class="titleGridReg" align="center" valign="top">שער נמוך</td><td class="titleGridReg" align="center" valign="top">שער גבוה</td>
<td class="titleGridReg" align="center" valign="top">שער בסיס</td>
<td class="titleGridReg" align="center" valign="top">שער פתיחה</td><td class="titleGridReg" align="center" valign="top"><span dir="rtl">שער נעילה (באגורות)</span></td>
<td class="titleGridReg" align="center" valign="top">שער נעילה מתואם</td><td class="titleGridReg" align="center" valign="top">תאריך</td>
</tr>
<tr onmouseover="this.style.backgroundColor='#FDF1D7'" onmouseout="this.style.backgroundColor='#ffffff'">
......等等
我的代码:
html = br.response().read()
soup = BeautifulSoup(html)
table = soup.find(lambda tag: tag.name=='table' and tag.has_key('id') and tag['id']=="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1")
rows = table.findAll(lambda tag: tag.name=='tr')
In [100]: print table
None
答案 0 :(得分:9)
table = soup.find('table', id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1")
对于行行:
rows = table.findAll('tr')
对于编码问题,请尝试从utf-8
对其进行解码,然后对其进行重新编码。
html = br.response().read().decode('utf-8')
soup = BeautifulSoup(html.encode('utf-8'))
答案 1 :(得分:1)
改善aiKid的答案:
# coding=utf-8
from bs4 import BeautifulSoup
html = u"""
<table cellspacing="0" cellpadding="3" border="0" id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1" style="width:100%;border-collapse:collapse;">
<tr class="gridHeader" valign="top">
<td class="titleGridRegNoB" align="center" valign="top"><span dir=RTL>שווי שוק (אלפי ש"ח)</span></td><td class="titleGridReg" align="center" valign="top">הון רשום למסחר</td><td class="titleGridReg" align="center" valign="top">שער נמוך</td><td class="titleGridReg" align="center" valign="top">שער גבוה</td><td class="titleGridReg" align="center" valign="top">שער בסיס</td><td class="titleGridReg" align="center" valign="top">שער פתיחה</td><td class="titleGridReg" align="center" valign="top"><span dir="rtl">שער נעילה (באגורות)</span>
</td><td class="titleGridReg" align="center" valign="top">שער נעילה מתואם</td><td class="titleGridReg" align="center" valign="top">תאריך</td>
</tr><tr onmouseover="this.style.backgroundColor='#FDF1D7'" onmouseout="this.style.backgroundColor='#ffffff'">
"""
soup = BeautifulSoup(html)
print soup.find_all("table",
id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1")
由于您正在使用UTF-8数据,因此需要将字符串设置为类似于u"""(...)"""
的unicode字符串。使用unicode所需要做的就是:
br.response().read().decode('utf-8')
上面将为您提供一个ASCII字符串,您可以稍后将其编码为unicode。比如说,字符串存储在html
中,您可以使用html.encode("utf-8")
将其编码回unicode。如果您这样做,则无需将u
放在任何前面。您可以再次将所有内容视为常规字符串。