我有以下HTML代码:
<tbody>
<tr>
<td><a href="/block_explorer/address/1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa">1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa</a></td>
<td><a href="/block_explorer/address/hash/62e907b15cbf27d5425399ebf6f0fb50ebb88f18/">62e907b15cbf27d5425399ebf6f0fb50ebb88f18</a></td>
<td class="num">66.6771<small class="b-blockExplorer__small">1246</small> BTC</td>
<td class="num">66.6771<small class="b-blockExplorer__small">1246</small> BTC</td>
<td class="num">1089</td>
</tr>
<tr>
<td><a href="/block_explorer/address/12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX">12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX</a></td>
<td><a href="/block_explorer/address/hash/119b098e2e980a229e139a9ed01a469e518e6f26/">119b098e2e980a229e139a9ed01a469e518e6f26</a></td>
<td class="num">50.0572<small class="b-blockExplorer__small">3154</small> BTC</td>
<td class="num">50.0572<small class="b-blockExplorer__small">3154</small> BTC</td>
<td class="num">55</td>
</tr>
<!--- SNIP --->
</tbody>
我想解析它以获得类似的东西:
1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa,62e907b15cbf27d5425399ebf6f0fb50ebb88f18,66.6771,66.6771
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX,119b098e2e980a229e139a9ed01a469e518e6f26,50.0572,50.0572
尝试使用BeautifulSoup:
soup.select('tbody > tr > td')[rowcount].get_text(strip=True)
我只得到拳头<td>*</td>
我做错了什么?
答案 0 :(得分:0)
通过执行以下操作,我能够找到你想要的东西:
from bs4 import BeautifulSoup
html = """<tbody>
<tr>
<td><a href="/block_explorer/address/1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa">1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa</a></td>
<td><a href="/block_explorer/address/hash/62e907b15cbf27d5425399ebf6f0fb50ebb88f18/">62e907b15cbf27d5425399ebf6f0fb50ebb88f18</a></td>
<td class="num">66.6771<small class="b-blockExplorer__small">1246</small> BTC</td>
<td class="num">66.6771<small class="b-blockExplorer__small">1246</small> BTC</td>
<td class="num">1089</td>
</tr>
<tr>
<td><a href="/block_explorer/address/12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX">12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX</a></td>
<td><a href="/block_explorer/address/hash/119b098e2e980a229e139a9ed01a469e518e6f26/">119b098e2e980a229e139a9ed01a469e518e6f26</a></td>
<td class="num">50.0572<small class="b-blockExplorer__small">3154</small> BTC</td>
<td class="num">50.0572<small class="b-blockExplorer__small">3154</small> BTC</td>
<td class="num">55</td>
</tr>
<!--- SNIP --->
</tbody>"""
b = BeautifulSoup(html, 'lxml')
for tr in b.find_all('tr'):
data = tr.find_all('td')
val1 = data[0].find('a').text
val2 = data[1].find('a').text
num1 = data[2].text.split()[0]
num2 = data[3].text.split()[0]
print(val1, val2, num1, num2)
这导致:
1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa 62e907b15cbf27d5425399ebf6f0fb50ebb88f18 66.67711246 66.67711246
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX 119b098e2e980a229e139a9ed01a469e518e6f26 50.05723154 50.05723154
答案 1 :(得分:0)
试试这个
for row in soup.select('tbody tr'):
row_text = [x.text for x in row.find_all('td')]
print(', '.join(row_text)) # You can save or print this string however you want.
输出:
1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa, 62e907b15cbf27d5425399ebf6f0fb50ebb88f18, 66.67711246 BTC, 66.67711246 BTC, 1089
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX, 119b098e2e980a229e139a9ed01a469e518e6f26, 50.05723154 BTC, 50.05723154 BTC, 55