我现在拥有所有150GB比特币块?如何打开它们并用Python读取它们?到目前为止我需要提取所有使用过的hash160
我试图用Berkeley DB打开它们但没有成功看起来这些文件不是Berkeley DB 并且blkxxxxx.dat和revxxxxx.dat文件之间有什么区别?看来revxxxxx.dat文件的文件大小有所改善
答案 0 :(得分:3)
一个简单的,非常简单的python脚本足以通过来自bitcoin-core
的RPC提取所有地址。这具有以下优点:bitcoin-core
处理所有解析和相关问题。为此,必须使用bitcoin-core
txindex=1
要运行该脚本,请确保安装以下依赖项:
sudo pip install python-bitcoinrpc
剧本:
import sys
from bitcoinrpc.authproxy import AuthServiceProxy
RPC_ADDRESS="127.0.0.1:8332"
RPC_USER="u"
RPC_PASSWORD="p"
def connect(address, user, password):
return AuthServiceProxy("http://%s:%s@%s"%(user, password, address))
def extract_block_addresses(rpc, block_hash):
block = rpc.getblock(block_hash)
addresses = []
for tx in block[u'tx']:
raw_tx = rpc.getrawtransaction(tx, True)
if not raw_tx.has_key('vout'):
sys.stderr.write("Transaction %s has no 'vout': %s\n"%(tx, raw_tx))
break
for vout in raw_tx[u'vout']:
if not vout.has_key("scriptPubKey"):
sys.stderr.write("Vout %s of Transaction %s has no 'scriptPubKey'\n"%(vout, tx))
break
if vout["scriptPubKey"]["type"] == "nulldata":
# arbitrary data
break
elif vout['scriptPubKey'].has_key('addresses'):
addresses.extend(vout['scriptPubKey']['addresses'])
else:
sys.stderr.write("Can't handle %s transaction output type in transaction %s\n"%(vout["scriptPubKey"]["type"], raw_tx))
return addresses
if __name__ == "__main__":
if len(sys.argv) > 1:
start_block = int(sys.argv[1])
else:
start_block = 1
if len(sys.argv) > 2:
end_block = int(sys.argv[2])
else:
end_block = 0
rpc = connect(RPC_ADDRESS, RPC_USER, RPC_PASSWORD)
if end_block == 0:
end_block = rpc.getblockcount()
b = start_block
for b in xrange(start_block, end_block+1):
try:
block_hash = rpc.getblockhash(b)
for addr in extract_block_addresses(rpc, block_hash):
print addr
except:
rpc = connect(RPC_ADDRESS, RPC_USER, RPC_PASSWORD)
block_hash = rpc.getblockhash(b)
for addr in extract_block_addresses(rpc, block_hash):
print addr
默认情况下,bitcoin-core
使用4个RPC线程运行。因此,启动脚本的多个实例以使用所有核心是有意义的。此外,压缩生成的地址列表是有意义的:
time python bitcoin-addresses.py 1 100000 2> bad_transaction-1.log | gzip -9 > addresses-1.gz &
time python bitcoin-addresses.py 100000 200000 2> bad_transaction-2.log | gzip -9 > addresses-2.gz &
time python bitcoin-addresses.py 200000 300000 2> bad_transaction-3.log | gzip -9 > addresses-3.gz &
time python bitcoin-addresses.py 30000 2> bad_transaction-4.log | gzip -9 > addresses-4.gz &
如果 - 对我来说 - 你的硬盘驱动器原来是以前方法的瓶颈只运行一个实例更好:
time python bitcoin-addresses.py 2> bad.log | gzip -9 > addresses.gz
请注意,脚本不会跟踪已知地址。因此,输出将包含重复项。我们可以使用sort -u
解决该问题:
zcat addresses.gz | sort -u
答案 1 :(得分:2)
This软件似乎只是做你想要的。它的README包含这个例子:
. Compute and print the balance for all keys ever used since the beginning of time:
./parser all >all.txt
更新:
如果我运行上一个命令,我会得到以下结果:
root@81d54ebe5b25:~/blockparser# ls -alh all.txt
-rw-r--r-- 1 root root 900M Aug 25 09:33 all.txt
root@81d54ebe5b25:~/blockparser# head all.txt
---------------------------------------------------------------------------
State of the ledger at block 194124 (minted : Thu Aug 16 03:36:13 2012)
---------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Balance Hash160 Base58 nbIn lastTimeIn nbOut lastTimeOut
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
507335.01317523 8bf24a18a58ab500d30c73bf21dbf4703d31ad2c 1DkyBEKt5S2GDtv7aQw6rQepAvnsRyHoYM 152 Tue Aug 14 18:11:09 2012 17 Tue Jul 17 02:32:38 2012
105555.03133700 582431b9e63d2394c8b224d1bc45d07ae95d2379 1933phfhK3ZgFQNLGSDXvqCn32k2buXY8a 48 Fri Jun 22 16:26:43 2012 0 Thu Jan 1 00:00:00 1970
79957.03133700 a0b0d60e5991578ed37cbda2b17d8b2ce23ab295 1FeexV6bAHb8ybZjqQMjJrcCrHGW9sb6uF 4 Sun Jul 15 20:36:59 2012 0 Thu Jan 1 00:00:00 1970
53000.03133700 3d9e561f21d312f9b8b46e74169263e2452d5591 16cou7Ht6WjTzuFyDBnht9hmvXytg6XdVT 16 Sun Jul 15 20:36:59 2012 9 Sun May 13 12:13:16 2012
这是在一个未完全同步的比特币节点上。