您好,可以从此站点导出到txt文件:
http://bitinfocharts.com/top-100-richest-bitcoin-addresses.html
所有地址?
像:
1BPqtqBKoUjEq8STWmJxhPqtsf3BKp5UyE
1i7cZdoE9NcHSdAL5eGjmTJbBVqeQDwgw
etc...
我写这段代码:
<?
$html = file_get_contents('http://bitinfocharts.com/top-100-richest-bitcoin-addresses-5.html');
//Create a new DOM document
$dom = new DOMDocument;
//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);
//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementsByTagName('a');
//Iterate over the extracted links and display their URLs
foreach ($links as $link){
//Extract and show the "href" attribute.
echo $link->getAttribute('href'), '<br>';
}
?>
但它会打印所有链接标题,我只需要地址......
答案 0 :(得分:1)
这可以通过文本操作简单地完成:
// get page
$html = file_get_contents('http://bitinfocharts.com/top-100-richest-bitcoin-addresses.html');
// split on bit just in front of address
$parts = explode('./bitcoin/address/',$html);
// dump the first part
array_shift($parts);
// get addresses from all subsequent parts
foreach ($parts as $part) $addresses[] = substr($part,0,strpos($part,'"'));
// show result
echo implode('<br>',$addresses);
评论解释了代码。我承认,与DOM一起工作有其优雅。
答案 1 :(得分:1)
我要做的是定位每一行,然后定位锚链接。例如:
$html = file_get_contents('http://bitinfocharts.com/top-100-richest-bitcoin-addresses-5.html');
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$data = array();
$table_rows = $xpath->query('//h1[contains(text(), "Top 100 Richest Addresses Bitcoin")]/following-sibling::div[2]/table/tr');
foreach($table_rows as $row) {
$cell = $xpath->query('./td[2]/a', $row);
if($cell->length > 0) {
$data[] = $cell->item(0)->nodeValue;
}
}
echo '<pre>';
print_r($data);
//file_put_contents('your_file.txt', implode("\n", $data));
$data
看起来像这样:(部分内容)
Array
(
[0] => 1KcRjW2roV8dtZoBMPD83nsFburPCY7RfR
[1] => 1LovisaJ31py5rr37y5xpt3MzSjErpoeLr
[2] => 1BE1ttHnrJ7YKkLgKpiNrp8uT3kM6Y1xfg
[3] => 1Czx5RKaDkiE56RwdeLXRYL57ZxxdFxwhb
[4] => 1BhQDdQgVyAekFZjT1nW8PB5XRt9VJhRs5
[5] => 1JsSF3YLF4v9Fasfu6pqevwWc5Mtyf76M3