我想显示期刊列表及其缩写,如:
期刊名称,缩写
我从以下网址获取数据: http://images.webofknowledge.com/WOK46/help/WOS/D_abrvjt.html 所以我正在运行以下内容:
$ ch = curl_init();
//Set options
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => 'http://images.webofknowledge.com/WOK46/help/WOS/A_abrvjt.html'
));
$result = curl_exec($curl);
curl_close($curl);
$data=json_decode($result, true);
//!End function, make_call
但现在它告诉我的是整个页面,但正如我所说,我只需要期刊的名称(dt)和缩写(dd)。那么如何解析结果呢?
答案 0 :(得分:1)
通过Simple HTML DOM进行HTML DOM解析 刮痧法......
<?php
Function Scraper($file, $cnt = NULL) {
/*
@param $file, url or path/file
@param $cnt, (number of results to list) empty for all, or number
*/
require_once('PATH/TO/simple_html_dom.php');
//set_time_limit(0); // uncomment for large files
$result = array();
// Create DOM from URL
$html = file_get_html($file);
IF ($html) {
IF (empty($cnt)) { $cnt = count($html->find('DT')); }
foreach($html->find('DL') as $dl) {
for ($i = 0; $i < $cnt; $i++) {
$dt = $dl->find('DT', $i)->plaintext;
$dd = $dl->find('DD', $i)->plaintext;
$result[] = array(trim($dt) => trim($dd));
}
}
}
return $result;
}
$array = Scraper('http://somesite.com/page.html');
print_r($array);
?>
示例输出......
Array
(
[0] => Array
(
[D H LAWRENCE REVIEW] => D H LAWRENCE REV
)
[1] => Array
(
[D-D EXCITATIONS IN TRANSITION-METAL OXIDES] => SPRINGER TR MOD PHYS
)
[2] => Array
(
[DADOS-REVISTA DE CIENCIAS SOCIAIS] => DADOS-REV CIENC SOC
)
[3] => Array
(
[DAEDALUS] => DAEDALUS
)
[4] => Array
(
[DAEDALUS] => DAEDALUS-US
)
[5] => Array
(
[DAGHESTAN AND THE WORLD OF ISLAM] => SUOMAL TIED TOIM SAR
)
)
更新了针对user350082问题的示例......
定义列表DT和DD标签未关闭,导致dd包含在find(&#39; dt&#39;)结果中。
<DT>D H LAWRENCE REVIEW<B><DD> D H LAWRENCE REV</B>
<DT>D-D EXCITATIONS IN TRANSITION-METAL OXIDES<B><DD> SPRINGER TR MOD PHYS</B>
etc. etc. etc.
更新功能......
Function Scraper($file, $cnt = NULL) {
/*
@param $file, url or path/file
@param $cnt, (number of results to list) empty for all, or number
*/
require_once('PATH/TO/simple_html_dom.php');
//set_time_limit(0); // uncomment for large files
$result = array();
// Create DOM from URL
$html = file_get_html($file);
IF ($html) {
foreach($html->find('DL') as $dl) {
IF (empty($cnt)) { $cnt = count($html->find('DT')); } // set count if null
for ($i = 0; $i < $cnt; $i++) {
$dd = $dl->find('DD', $i)->plaintext;
$dt = $dl->find('DT', $i)->innertext; // dt with html tags, easier for removing dd duplication
$dt = preg_replace('/\s+/', ' ',$dt); // remove extra whitespace, tabs etc.
// strip DD text duplication from DT
IF (($pos = strrpos($dt ,$dd)) !== false) {
$strlen = strlen($dd);
$dt = substr_replace($dt, "", $pos, $strlen);
}
$dt = strip_tags($dt); // remove html tags
IF (empty($dt)) { $dt = $dd; } // make sure dt is not empty
$result[] = array(trim($dt) => trim($dd));
}
}
}
return $result;
}