所以我正在使用一些稍微棘手的代码。基本上,我试图在我正在开发的网站上从大约10页提取<script>
标签。此代码的语法不正确,因为您不能在函数参数中使用括号,但它是我尝试执行的内容的本质:
var parser = new DOMParser();
var resp = new Array();
var htmlDoc = new Array();
var findScripts = new Array();
var searchScripts = new Array();
var scriptContent = new Array();
for (var i = 0; i < amt; i++) {
resp[i] = prodDetails[i].responseText;
htmlDoc[i] = parser.parseFromString(resp[i],"text/html");
findScripts[i] = htmlDoc[i].body.querySelectorAll('script');
searchScripts[i] = Array.prototype.filter.call(findScripts[i], function (findScripts[i]) {
return RegExp('var prodInfo = ').test(findScripts[i].textContent);
});
scriptContent[i] = searchScripts[i].innerText;
}
此外,可能不需要的细节:
我使用以下代码抓取每个页面:
var text = "";
var prodDetails = new Array();
var amt = document.querySelectorAll('[id="product-details"]').length;
for (var i = 0; i < amt; i++) {
prodDetails[i] = $.get(itemPages[i].href, {}, function (results) {
});
}
然后,我将解析信息,以便可以通过简单的JavaScript命令来提取标记:
var parser = new DOMParser();
var resp = new Array();
var htmlDoc = new Array();
for (var i = 0; i < amt; i++) {
resp[i] = prodDetails[i].responseText;
htmlDoc[i] = parser.parseFromString(resp[i],"text/html");
}
这适用于通过htmlDoc[0]
调用htmlDoc[9]
来单独访问每个网页的DOM,但每个网页上大约有8个<script>
个标记。我正在寻找的那个在其innerHTML中包含特定的文本。我可以找到我正在寻找的那个:
var findScripts = htmlDoc[0].body.querySelectorAll('script');
var searchScripts = Array.prototype.filter.call(findScripts, function (findScripts) {
return RegExp('var prodInfo = ').test(findScripts.textContent);
});
var scriptContent = searchScripts[0].innerText;
此代码在单独运行时效果很好,但这意味着每次更改htmlDoc
的索引值时都会手动运行,而我正在寻找更多的&#34;一次性#&# 34;溶液
我并不反对在此使用jQuery,但我对此并不熟悉。如果有一个更强大的基于jQuery的解决方案,我也会接受它。任何帮助表示赞赏!谢谢!
答案 0 :(得分:0)
不是一遍又一遍地分配给someArr [i]和someOtherArr [i],我建议只是推送到有问题的数组,代码看起来会更清晰。此外,您可以使用forEach
来获得更好的抽象,避免手动迭代,并避免i
吊装的问题:
const parser = new DOMParser();
const responseTexts = [];
const htmlDocs = [];
const findScripts = [];
const filteredScripts = [];
const scriptContent = [];
prodDetails.forEach((response) => {
const responseText = response.responseText;
const newDoc = parser.parseFromString(responseText,"text/html");
const thisDocScripts = newDoc.querySelectorAll('script');
const thisDocFilteredScripts = [...thisDocScripts]
.filter(oneScript => oneScript.textContent.includes('const prodInfo = '));
const thisDocScriptsContent = thisDocFilteredScripts.map(scr => scr.textContent);
responseTexts.push(responseText);
htmlDocs.push(newDoc);
findScripts.push(thisDocScripts);
filteredScripts.push(thisDocFilteredScripts);
scriptContent.push(thisDocScriptsContent);
});
抽象很精彩。
你真的需要保存在数组中的所有变量吗?