如何遍历HTML

时间:2016-11-06 23:44:56

标签: javascript html node.js typescript traversal

我有一个带有HTML代码的变量:

let htmlDocument = '<div id="buildings-wrapper"> \
    <div id="building-info"> \
    <h2><span class="field-content">Britney Spears' House</span></h2> \
    <div class="building-field"> \
    <div class="field-content">9999 Hollywood Blvd</div> \
    </div> \
    <div class="building-field"> \
    <div class="field-content">Building Hours: Mon. 07:00-23:00 Tue.-Fri. 06:30-22:00, Sat. 07:30-18:00, Sun. 12:00-18:00 Holidays - Closed</div> \
    </div> \
    <div class="building-field"> \
    <div class="field-content"><a href="http://www.britneyspears.com">Locate on the stars map</a></div> \
    </div> \
    </div> \
    <div id="building-image"> \
    <div class="field-content"><img src="../../../../ssc.adm.britneyspears.com/classroomservices/image/viewimage?userEvent=ShowBuildingImage&amp;buildingID=britneyspears" alt="Image of BritneySpears"></div> \
        </div> \
        </div>';

我想遍历变量并将HTML的这一部分存储在一个单独的变量中:

<div class="field-content">9999 Hollywood Blvd</div>

这是我到目前为止所做的:

public traverseHTML(htmlDocument: any): any {
    let htmlBlock: any;
    let divs: any = htmlDocument.getElementsByTagName('div');
    for (var i = 0; i < divs.length; i++) {
        if (divs[i].getAttribute("id") == "field-content") {
            htmlBlock = divs[i];
        }
    }
    return htmlBlock;
}

我确信我的功能有各种各样的问题,但我无法找到它们,因为我甚至无法超越第二行。我收到一条错误,说htmlDocument.getElementsByTagName不是函数。如何通过div迭代HTML?

请注意,由于项目规范,我无法使用JQuery。

编辑:

当我尝试document is not defined时,我得到document.createElement('div'),当我尝试创建DOMParser时未定义DOMParser。我是否错误地设置了课程?这是整个班级的代码:

import parse5 = require('parse5');
import {ASTNode} from 'parse5';



export default class DSController {
//private parser: DOMParser;

constructor() {
    //this.parser = new DOMParser();
}

public traverseHTML(htmlDocument: any): any {
    let parser = new DOMParser();
    let parsed: any = parser.parseFromString(htmlDocument, "text/html");
    let selectParsed: any = parsed.querySelectorAll('field-content')[1];
    console.log(selectParsed);

    return selectParsed;

   /* let element = document.createElement("div");
    element.innerHTML = htmlDocument;
    console.log(element.querySelectorAll(".field-content")[1]); // <div class="field-content">9999 Hollywood Blvd</div>
    */
}




public parseHTML(): any {

    //let document: parse5.ASTNode;
    return;
}
}

2 个答案:

答案 0 :(得分:1)

您可以创建一个元素,然后将此字符串作为html插入其中 然后,您可以查询此元素以查找您要查找的内容:

let htmlDocument = '<div id="buildings-wrapper"> \
    <div id="building-info"> \
    <h2><span class="field-content">Britney Spears House</span></h2> \
    <div class="building-field"> \
    <div class="field-content">9999 Hollywood Blvd</div> \
    </div> \
    <div class="building-field"> \
    <div class="field-content">Building Hours: Mon. 07:00-23:00 Tue.-Fri. 06:30-22:00, Sat. 07:30-18:00, Sun. 12:00-18:00 Holidays - Closed</div> \
    </div> \
    <div class="building-field"> \
    <div class="field-content"><a href="http://www.britneyspears.com">Locate on the stars map</a></div> \
    </div> \
    </div> \
    <div id="building-image"> \
    <div class="field-content"><img src="../../../../ssc.adm.britneyspears.com/classroomservices/image/viewimage?userEvent=ShowBuildingImage&amp;buildingID=britneyspears" alt="Image of BritneySpears"></div> \
        </div> \
        </div>';

let element = document.createElement("div");
element.innerHTML = htmlDocument;

console.log(element.querySelectorAll(".field-content")[1]); // <div class="field-content">9999 Hollywood Blvd</div>

code in playground

答案 1 :(得分:1)

您还可以使用DOMParser

new DOMParser().parseFromString(htmlDocument, "text/html")
  .querySelectorAll('.field_content)[1]