我想知道是否可以从表中提取这些类标签,权重('wl')和单元格值。我在下面提供了示例行。
示例行如下。
<TABLE id='tbl5' class='display'>
<thead>
<TR><TH>Name </TH><TH> </TH><TH>Close</TH><TH>Tr </TH><TH>Mode </TH><TH>RevL (D)</TH><TH>MoM (D) </TH><TH>Days </TH><TH>P/L % </TH><TH>Action </TH></TR>
</thead>
<tbody>
<TR><TD>Aaron's, Inc.</TD><TD>AAN</TD><TD>40.53</TD><TD class='c6' wl='44.92'>2</TD><TD class='c7' data-sort='3'></TD><TD>42.35</TD><TD class='c1' data-sort='-4.71'>-4.71 ▲</TD><TD od='6687' op='45.40'>17</TD><TD class='c10'>10.73%</TD><TD></TD></TR>
<TR><TD>Abiomed Inc.</TD><TD>ABMD</TD><TD>380.35</TD><TD class='c4' wl='242.10'>63</TD><TD class='c4' data-sort='1'></TD><TD>323.03</TD><TD class='c1' data-sort='10.00'>10.00 ▲</TD><TD od='6670' op='290.16'>28</TD><TD class='c10'>31.08%</TD><TD></TD></TR>
<TR><TD>American Campus Communities</TD><TD>ACC</TD><TD>38.18</TD><TD class='c7' wl='40.03'>39</TD><TD class='c6' data-sort='4'></TD><TD>39.52</TD><TD class='c2' data-sort='2.16'>2.16 ▼</TD><TD od='0' op='0.00'>0</TD><TD class='c13'>0.00%</TD><TD>Sell</TD></TR>
我正在尝试遍历表格和单元格,但无法遍历似乎是的单元格。
soup = BeautifulSoup(html_file.encode('utf-8'))
table = soup.find('table', id='tbl5')
rows = table.find_all(lambda tag: tag.name=='tr')
for row in rows:
cells = row.find_all("td")
rn = cells[0].get_text()
print(cells)
跟踪,列表索引超出范围。单元格为空
答案 0 :(得分:1)
您可以直接搜索td
,并在获取wl
属性时要小心,因为并非所有td
都具有该属性:
for td in soup.find_all('td'):
wl = td.attrs.get('wl') # not all td's have 'wl' attribute
print(td.text, wl)
# Aaron's, Inc. None
# AAN None
# 40.53 None
# 2 44.92
# None
# 42.35 None
# -4.71 ▲ None
# 17 None
# 10.73% None
# None
# Abiomed Inc. None
# ABMD None
# 380.35 None
# 63 242.10
# None
# 323.03 None
# 10.00 ▲ None
# 28 None
# 31.08% None
# None
# American Campus Communities None
# ACC None
# 38.18 None
# 39 40.03
# None
# 39.52 None
# 2.16 ▼ None
# 0 None
# 0.00% None
# Sell None