在python

时间:2018-05-10 13:57:26

标签: python web-scraping

我是python的初学者,在刮桌时遇到麻烦。我希望的目标是不要在输出中留出任何空白空间。我的代码:

import requests
from bs4 import BeautifulSoup

# I am only interested in some particular blocks from the bitcoin blockchain
url = "https://blockchain.info/block-height/521578"

# Getting table from the url
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
stat_table = soup.find_all('table', class_ = 'table table-striped')
stat_table = stat_table[0]

for row in stat_table.find_all("tr"):
    for cell in row.find_all("td")[1:]: # look that I am interested only in the 2nd column
        print(cell.text)

使用前面的代码,我得到以下结果:

521578 (Main chain)
0000000000000000002809c9ae7546964580751b506e070f15002b1c1fdd66b3

0000000000000000002d4f2f654945fda08931355e9af871a8c2135a25da9cb6
00000000000000000023583cc0df49783e50c93d807c405524168016c84c0c2a
2018-05-07 06:56:50

2018-05-07 06:56:50
BTC.com

4,022,059,196,164.95

390462291
452
5,440.06056147 BTC
233.87351861 BTC
342.123 KB
0x20000000
ee7c7e2cde5e0f3567c9f635549ec62365e2ac45da517f41cf6c32787c3d8b4d
3688672863
12.5 BTC
0.11100607 BTC

但是当我想要保存.csv文件后,那些空行会给我带来麻烦。你知道怎么摆脱这些空白线吗?

如果我在代码中做了一些小改动,我会得到我想要的数据,除了我不需要的列(我只需要第二列):

for row in stat_table.find_all("tr"):
    for cell in row.find_all("td")[:1]: # Here is the change
        print(cell.text)

带有更改的输出(看看没有空行,但不在我需要的列中):

Height
Hash
Previous Block
Next Blocks
Time
Received Time
Relayed By
Difficulty
Bits
Number Of Transactions
Output Total
Estimated Transaction Volume
Size
Version
Merkle Root
Nonce
Block Reward
Transaction Fees

提前致谢

1 个答案:

答案 0 :(得分:0)

在打印之前测试是否需要该值。

for row in stat_table.find_all("tr"):
    for cell in row.find_all("td")[1:]:
        if cell.text != "":
            print(cell.text)