目标:
通过外部网页上的标签提取值。
方式:
执行HTTP请求并使用响应构造jsdom
对象。使用jsdom
的查询选择器从标记中获取值。
问题:
当我尝试访问任何代码的值时... console.log(dom.window.document.querySelector("h4").textContent);
...我收到错误:"无法读取属性' textContent' of null"。
这必然意味着jsdom
对象由于chunk
参数的问题而未正确构造(块是响应对象的字符串)。
讨论:
我的猜测是,响应块中的引用转义存在问题,但我的正则表达式尝试没有看到任何结果。如果我传递一个像dom.window.document.querySelector("h4").textContent
这样的简单字符串,<html><body><h4>testing</h4></body></html>
工作正常。
守则的重要部分:
res.on('data', (chunk) => {
console.log(typeof(chunk)); // string
const dom = new JSDOM(chunk);
console.log(dom.window.document.querySelector("h4").textContent);
});
所有代码:
var querystring = require('querystring');
var http = require('http');
const jsdom = require("jsdom");
const { JSDOM } = jsdom;
const postData = querystring.stringify({
'id': '1'
});
const options = {
hostname: 'www.southernnbtruckers.ca',
port: 80,
path: '/search/info/6',
method: 'POST',
headers: {
'Content-Type': 'application/x-www-form-urlencoded',
'Content-Length': Buffer.byteLength(postData)
}
};
const req = http.request(options, (res) => {
res.setEncoding('utf8');
//Problem is likely to do with the HTTP response (chunk)
res.on('data', (chunk) => {
console.log(typeof(chunk)); // string
const dom = new JSDOM(chunk);
console.log(dom.window.document.querySelector("h4").textContent); //Cannot read property 'textContent' of null
});
res.on('end', () => {
//Do stuff
});
});
req.on('error', (e) => {
console.error(`problem with request: ${e.message}`);
});
req.write(postData);
req.end();
HTML代码供参考:
<html>
<head>
<base href="http://www.southernnbtruckers.ca/">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<meta name="generator" content="People and Groups"/>
<meta name="keywords" content=""/>
<title>TRUCKERS - Search</title>
<link rel="stylesheet" type="text/css" href="/core/styles/style_custom.php?org_name=truckers&language=english"/>
</head>
<body><a name="top"></a>
<div id="Container">
<div id="Header">
<table id="Lang">
<tr>
<td valign="middle"> <a href="/login">Login</a></td>
</tr>
</table>
</div>
<div id="MainNav">
<div id="Nav">
<ul>
<li class="LeftSelectedNav"><a href="/search">Home</a></li>
<li><a href="/contact_information">Contact</a></li>
<li style="float:right;" class="right"> </li>
</ul>
</div>
</div>
<div id="MainContent">
<div id="SideNav">
<div id="Sub">
<ul>
<li id="SubTitle">Home</li>
<li class="subsub"><a href="/our_mission_statemnt">Our Mission Statement</a
></li>
<li class="subsub"><a href="/our_executive">Our Executive</a></li>
<li class="currentSub"><a href="/search">Search</a></li>
<li id="SubSpacer"></li>
</ul>
<ul>
<li class="SubBlank"><h4>PO Box 342, Harvey, York Co. NB E6K 3W9</h4>
<center>
<table class="featureImageTable">
<tr>
<td><img src="/uploads/Website_Assets/truckers-sidetest.jpg" alt="Side Test"
title=""/></td>
</tr>
<tr>
<td></td>
</tr>
</table>
</center>
<br/><br/><a href="http://www.partsfortrucks.com" target="_tab">
<center>
<table class="featureImageTable">
<tr>
<td><img src="/uploads/Website_Assets/PartTrucks.jpg" alt="Parts Trucks 300px"
title=""/></td>
</tr>
<tr>
<td></td>
</tr>
</table>
</center>
</a></p></li>
<li id="SubSpacer"></li>
</ul>
</div>
</div>
<div id="Main">
<table class="data" id="mainContentTable" cellspacing="0" cellpadding="0" width="100%">
<tr>
<td valign="top"><h1>Truckers Search</h1>Find what you need! This database is easy to use - if
you're looking for a specific piece of equipment for hire just use the pull down menu that says
"Company Name" and locate the equipment you require, then press return or the filter button. If
you're looking for a company to work in a specific county in New Brunswick - just use the pull
down menu to identify the county. You can also click on the name of any trucker to bring up
their equipment profile and contact information.
<hr/>
<a href="/search"><< Back to search</a>
<hr>
<h1>Gary MacBean</h1>
<div class="contact">Contact: Gary MacBean</div>
<hr>
Address1: 150 Sunrise Estates Avenue<br>City: New Maryland<br>Province: NB<br>Postal
Code: E3C 1G6<br>Phone: 1 506 459-3609<br>Cell Phone: 1 506 444-1358<br>Fax: 1
506 459-5154<br>
<hr>
Number Of Trucks: 2<br>Has Dump Trailer: Yes<br>Has Tandem Dump Truck: Yes<br>Has
Belly Dump: Yes<br>Has Asphalt Tarp Spreader: Yes<br>
<hr>
Has Compensation WorkSafeNB: Yes<br>Has Liability Insurance: Yes<br>Has HST Number: Yes<br>
<hr>
Works Province Wide: Yes<br>
<hr>
<hr/>
<a href="/contact_information">Comment, Questions?</a></td>
</tr>
</table>
<div id="Footer">
<hr/>
<p>
<h2>Serving Central and Southern New Brunswick</h2><br/><br/>Powered by: <a
href="http://www.peopleandgroups.com" title="www.peopleandgroups.com">People&Groups</a></p></div>
</div>
</div>
</div>
</body>
</html>
思考?我应该使用其他方法从其他网站捕获数据吗?