我正在使用Simple html dom(http://simplehtmldom.sourceforge.net/)库来实现此功能。
我想解析网站的pre标签内容,我正在使用此代码:
<?php include '/libraries/simple_html_dom.php' ?>
<?php
// Create DOM from URL or file
$html = file_get_html('testing.html');
// Find the Text
foreach($html->find('pre') as $element)
echo '<p>' . $element . '<p>';
?>
这是文件&#39; testing.html&#39;:
的内容 <html>
<head>
</head>
<body bgcolor="#FFFFFF">
<pre>
am.o V 1 1 PRES ACTIVE IND 1 S
amo, amare, amavi, amatus V [XXXAO]
love, like; fall in love with; be fond of; have a tendency to;
am.as N 1 1 ACC P F
ama, amae N F [XXXDO] lesser
bucket; water bucket; (esp. fireman's bucket);
am.as V 1 1 PRES ACTIVE IND 2 S
amo, amare, amavi, amatus V [XXXAO]
love, like; fall in love with; be fond of; have a tendency to;
</pre>
</body>
</html>
正如您所看到的,预文本有carridge返回,我想在输出中保留。目前这是解析器的输出:
am.o V 1 1 PRES ACTIVE IND 1 S amo, amare, amavi, amatus V [XXXAO] love, like; fall in love with; be fond of; have a tendency to; am.as N 1 1 ACC P F ama, amae N F [XXXDO] lesser bucket; water bucket; (esp. fireman's bucket); am.as V 1 1 PRES ACTIVE IND 2 S amo, amare, amavi, amatus V [XXXAO] love, like; fall in love with; be fond of; have a tendency to;
我该怎么做?
答案 0 :(得分:1)
使用echo '<p>' . $element->innerHTML . '<p>';
答案 1 :(得分:1)
用BR标签替换换行符。您可以使用nl2br()。
答案 2 :(得分:1)
您必须指定文本节点:
foreach($html->find('pre') as $element)
echo '<p>' . $element->innertext . '<p>';
答案 3 :(得分:0)
事实证明这真的很简单!简单的HTML Dom不是必需的,因为没有像这样的库可以完成:
$file = file_get_contents('testing.html');
$start = '<html>';
$end = '<pre>';
$string = $file;
$whatwearelookingfor = strstr( substr( $string, strpos( $string, $start) + strlen( $start)), $end, true);
$parsedresult = str_replace($whatwearelookingfor,"",$file);
$parsedresult = str_replace("<html>","",$parsedresult);
$parsedresult = str_replace("</body></html>","",$parsedresult);
echo $parsedresult;
返回pre-Preserving返回的内容!