我有一个经典的HTML网页
<html>
<head>
<meta charset="utf-8">
<title>Some text</title>
<link rel="stylesheet" href="style.css">
<script src="script.js"></script>
<script>
var text = "Hi guys !";
</script>
</head>
<body>
<h1>Hello guys</h1>
<p>Some text <strong>is more important</strong></p>
<input value="Here also is some text" placeholder="and here too">
<a href="not here">here is some text</a>
</body>
</html>
我希望能够使用php从网页上获取所有文本。 检查DOMText的nodeType将忘记占位符。
有没有一种简单的方法可以快速获取所有真实文本(在我的情况下意味着所有英文文本)?
答案 0 :(得分:0)
假设您只想要body
元素的孩子......
示例HTML
<html><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title> Example</title>
</head>
<body>
a <div>b<span>c</span></div>
</body></html>
的JavaScript
var body = document.body;
var textContent = body.textContent || body.innerText;
console.log(textContent); // a bc
您需要检查textContent
,因为我们的好朋友IE使用的是innerText
。
如果你有一个像jQuery这样的库,那就容易多了,即$('body').text()
。
答案 1 :(得分:0)
参考:http://www.phpro.org/examples/Get-Text-Between-Tags.html
<?php
$html='<html>
<head>
<meta charset="utf-8">
<title>Some text</title>
<link rel="stylesheet" href="style.css">
<script src="script.js"></script>
<script>
var text = "Hi guys !";
</script>
</head>
<body>
<h1>Hello guys</h1>
<p>Some text <strong>is more important</strong></p>
<input value="Here also is some text" placeholder="and here too">
<a href="not here">here is some text</a>
</body>
</html>';
$content = getTextBetweenTags('body', $html);
foreach( $content as $item )
{
echo $item.'<br />';
}
function getTextBetweenTags($tag, $html, $strict=0)
{
/*** a new dom object ***/
$dom = new domDocument;
/*** load the html into the object ***/
if($strict==1)
{
$dom->loadXML($html);
}
else
{
$dom->loadHTML($html);
}
/*** discard white space ***/
$dom->preserveWhiteSpace = false;
/*** the tag by its tag name ***/
$content = $dom->getElementsByTagname($tag);
/*** the array to return ***/
$out = array();
foreach ($content as $item)
{
/*** add node value to the out array ***/
$out[] = $item->nodeValue;
}
/*** return the results ***/
return $out;
}
答案 2 :(得分:0)
使用DomDocument的textContent属性
<?
error_reporting(-1);
$dom = new DomDocument();
$dom->loadHTML($str);
echo $dom->textContent;
结果
Some text
var text = "Hi guys !";
Hello guys
Some text is more important
here is some text