如何在文本中选择<div>块?

时间:2016-10-08 16:57:20

标签: javascript search

我有一个由spark.yarn.executor.memoryOverhead(typically this 10% of the executor memory) 内的两个<div>组成的文本,保存为<body>,如下所示:

raw_text

我需要一个脚本,只在屏幕上打印原始文本中包含某个字符串的var raw_text = "<body><div>This is the 'div' text that I don't want.</div> <div>This is the 'div' text that I want to print.</div></body>";

如果字符串需要:

<div>

脚本应该采取:

var x = "that I want";

,输出应为:

<div>This is the 'div' text that I want to print.</div>

3 个答案:

答案 0 :(得分:1)

这是正确的方法:

  1. 使用DOM解析器
  2. 迭代文本节点
  3. 检查它们是否包含所需的字符串
  4. var html = "<body><div>This is the 'div' text that I don't want.</div> <div>This is the 'div' text that I want to print.</div></body>";
    var x = "that I want";
    var doc = new DOMParser().parseFromString(html, 'text/html');
    var it = doc.createNodeIterator(doc.body, NodeFilter.SHOW_TEXT);
    var node;
    while (node = it.nextNode()) if(node.nodeValue.includes(x)) {
      console.log(node.nodeValue);
      break;
    }

答案 1 :(得分:0)

var raw_text = "<body><div>This is the 'div' text that I don't want.</div> <div>This is the 'div' text that I want to print.</div></body>";
var x = "that I want";
var homework_solution = raw_text.match(new RegExp("<div>([^<>]*?"+x+"[^<>]*?)</div>"))[1];

这应该可以胜任。正则表达式可能会更加健壮。

&#34;适当&#34;这样做的方法是使用DOMParser搜索你想要的节点。

答案 2 :(得分:0)

您可以使用jQuery将字符串转换为正确的DOM元素,然后轻松解析它们,就像@Retr0spectrum在评论中所说的那样。你有一个纯字符串的HTML:

var htmlString = "<body><div>This is the 'div' text that I don't want.</div> <div>This is the 'div' text that I want to print.</div></body>";

现在你必须:

  1. 将其解析为DOM,
  2. 过滤元素,
  3. 获取文字
  4. 像这样:

    // Process the string through jQuery so it parses the DOM elements
    var dom = $(htmlString);
    
    // and then we convert to array...
    var array = dom.toArray();
    
    // ... so we can filter it, using RegEx to find the
    // <div>(s) we are interested in: 
    var matchingDivs = array.filter(function (div, i) {
      return $(div).text().match(/that I want/g) !== null;
    });
    
    // we pop the last matched div from the filtered array (the first 
    // one would also work, since normally you will find just one)
    var theDiv = matchingDivs.pop(); 
    
    // Then get the <div>'s text:
    var theText = selectedDiv.textContent;
    

    美妙的是你可以链接所有的方法,所以你可以像这样编写上面的内容:

    var theText = $(htmlString).toArray().filter(function (div, i) {
      return $(div).text().match(/that I want/g) !== null;
    })[0].textContent;
    

    注意:在链式方法示例中,我使用括号运算符[0]而不是pop()来获取第一个元素而不是最后一个元素。

    希望这有助于了解它的工作原理。