这是我的脚本,其中我提取了三个项目医学名称,通用名称,班级名称。我的问题是我成功地单独获取医学名称,但通用名称和类名称将作为字符串。如果您将运行该脚本,您将更好地了解我实际想说的内容,我想存储通用名称和类名是表中的单独列。
脚本
<?php
error_reporting(0);
//simple html dom file
require('simple_html_dom.php');
//target url
$html = file_get_html('http://www.drugs.com/condition/atrial-flutter.html?rest=1');
//crawl td columns
foreach($html->find('td') as $element)
{
//get drug name
$drug_name = $element->find('b');
foreach($drug_name as $drug_name)
{
echo "Drug Name:-".$drug_name;
foreach($element->find('span[class=small] a',2) as $t)
{
//get the inner HTML
$data = $t->plaintext;
echo $data;
}
echo "<br/>";
}
}
?>
提前致谢
答案 0 :(得分:1)
您当前的代码与您需要做的有点远,但您可以利用css选择器来简化这些元素。
示例:
$data = array();
$html = file_get_html('http://www.drugs.com/condition/atrial-flutter.html?rest=1');
foreach($html->find('tr td[1]') as $td) { // you do not need to loop each td!
// target the first td of the row
$drug_name = $td->find('a b', 0)->innertext; // get the drug name bold tag inside anchor
$other_info = $td->find('span.small[2]', 0); // get the other info
$generic_name = $other_info->find('a[1]', 0)->innertext; // get the first anchor, generic name
$children_count = count($other_info->children()); // count all of the children
$classes = array();
for($i = 1; $i < $children_count; $i++) { // since you already got the first, (in position zero) iterate all children starting from 1
$classes[] = $other_info->find('a', $i)->innertext; // push it inside another container
}
$data[] = array(
'drug_name' => $drug_name,
'generic_name' => $generic_name,
'classes' => $classes,
);
}
echo '<pre>';
print_r($data);