使用python的BeautifulSoup解析“/ /”

时间:2018-01-21 22:48:40

标签: python python-3.x beautifulsoup

我有以下HTML代码:

<tbody>
  <tr>
    <td><a href="/block_explorer/address/1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa">1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa</a></td>
    <td><a href="/block_explorer/address/hash/62e907b15cbf27d5425399ebf6f0fb50ebb88f18/">62e907b15cbf27d5425399ebf6f0fb50ebb88f18</a></td>
    <td class="num">66.6771<small class="b-blockExplorer__small">1246</small>&nbsp;BTC</td>
    <td class="num">66.6771<small class="b-blockExplorer__small">1246</small>&nbsp;BTC</td>
    <td class="num">1089</td>
  </tr>
  <tr>
    <td><a href="/block_explorer/address/12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX">12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX</a></td>
    <td><a href="/block_explorer/address/hash/119b098e2e980a229e139a9ed01a469e518e6f26/">119b098e2e980a229e139a9ed01a469e518e6f26</a></td>
    <td class="num">50.0572<small class="b-blockExplorer__small">3154</small>&nbsp;BTC</td>
    <td class="num">50.0572<small class="b-blockExplorer__small">3154</small>&nbsp;BTC</td>
    <td class="num">55</td>
  </tr>
  <!--- SNIP --->
</tbody>

我想解析它以获得类似的东西:

1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa,62e907b15cbf27d5425399ebf6f0fb50ebb88f18,66.6771,66.6771
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX,119b098e2e980a229e139a9ed01a469e518e6f26,50.0572,50.0572

尝试使用BeautifulSoup:

soup.select('tbody > tr > td')[rowcount].get_text(strip=True)

我只得到拳头<td>*</td> 我做错了什么?

2 个答案:

答案 0 :(得分:0)

通过执行以下操作,我能够找到你想要的东西:

from bs4 import BeautifulSoup

html = """<tbody>
  <tr>
    <td><a href="/block_explorer/address/1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa">1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa</a></td>
    <td><a href="/block_explorer/address/hash/62e907b15cbf27d5425399ebf6f0fb50ebb88f18/">62e907b15cbf27d5425399ebf6f0fb50ebb88f18</a></td>
    <td class="num">66.6771<small class="b-blockExplorer__small">1246</small>&nbsp;BTC</td>
    <td class="num">66.6771<small class="b-blockExplorer__small">1246</small>&nbsp;BTC</td>
    <td class="num">1089</td>
  </tr>
  <tr>
    <td><a href="/block_explorer/address/12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX">12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX</a></td>
    <td><a href="/block_explorer/address/hash/119b098e2e980a229e139a9ed01a469e518e6f26/">119b098e2e980a229e139a9ed01a469e518e6f26</a></td>
    <td class="num">50.0572<small class="b-blockExplorer__small">3154</small>&nbsp;BTC</td>
    <td class="num">50.0572<small class="b-blockExplorer__small">3154</small>&nbsp;BTC</td>
    <td class="num">55</td>
  </tr>
  <!--- SNIP --->
</tbody>"""

b = BeautifulSoup(html, 'lxml')
for tr in b.find_all('tr'):
    data = tr.find_all('td')
    val1 = data[0].find('a').text
    val2 = data[1].find('a').text
    num1 = data[2].text.split()[0]
    num2 = data[3].text.split()[0]
    print(val1, val2, num1, num2)

这导致:

1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa 62e907b15cbf27d5425399ebf6f0fb50ebb88f18 66.67711246 66.67711246
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX 119b098e2e980a229e139a9ed01a469e518e6f26 50.05723154 50.05723154

答案 1 :(得分:0)

试试这个

for row in soup.select('tbody tr'):
    row_text = [x.text for x in row.find_all('td')]
    print(', '.join(row_text))  # You can save or print this string however you want.

输出:

1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa, 62e907b15cbf27d5425399ebf6f0fb50ebb88f18, 66.67711246 BTC, 66.67711246 BTC, 1089
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX, 119b098e2e980a229e139a9ed01a469e518e6f26, 50.05723154 BTC, 50.05723154 BTC, 55