如何检测句点之间包含<span>标记的<p>标记中的句子

时间:2019-08-31 21:37:10

标签: javascript

我正在尝试检测/抓取包含标记的

标记中的句子。我想在一组句点之间得到整个句子。这必须在整个网页上完成。

例如,以下段落包含我要提取的跨度句子:

<p>The phospholipid heads face outward, one layer exposed to the interior of the cell and one layer exposed to the exterior. Because the <span>phosphate</span> groups are polar and hydrophilic, they are attracted to water in the intracellular fluid. Intracellular fluid (ICF) is the fluid interior of the cell.</p> 

我只想提取一句话:“由于磷酸基团是极性和亲水性的,因此它们被细胞内液中的水所吸引”,因为它含有标签

我可以对整个网页执行此操作吗?使用正则表达式还是js?

我已经在网上尝试了不同的正则表达式组合,但是它们都不起作用。

4 个答案:

答案 0 :(得分:3)

public function dosya_yukle () {

        $count = count($_FILES['dosya']['name']);
        $files = $_FILES;
        unset($_FILES);
        for ($i=0; $i < $count; $i++) { 

            $_FILES['dosya']['name'] = $files['dosya']['name'][$i];
            $_FILES['dosya']['type'] = $files['dosya']['type'][$i];
            $_FILES['dosya']['tmp_name'] = $files['dosya']['tmp_name'][$i];
            $_FILES['dosya']['error'] = $files['dosya']['error'][$i];
            $_FILES['dosya']['size'] = $files['dosya']['size'][$i];

            $config['upload_path'] = './assets/img';
            $config['allowed_types'] =  'gif|jpg|png';

            $this->load->library('upload', $config);

            $this->upload->do_upload('dosya');

        }
    }

使用DOM的方法,您可以遍历 for(const span of document.querySelectorAll("p span")) { const prevText = span.previousSibling.data; const afterText = span.nextSibling.data; const prev = prevText.slice( prevText.lastIndexOf(".") ); comst after = afterText.slice(0, afterText.indexOf(".")); // do whatever you wanna do here } 中的所有<span>,并分别使用<p>previousSibling访问之前和之后的文本。要获取“句子”,请使用。分隔句子。

到目前为止,这还没有完成,可能是上一个或下一个节点不是文本节点,或者这些文本节点中没有点。您必须适当处理这些情况。

请参阅:

Node on MDN

Text on MDN

.querySelectorAll on MDN

答案 1 :(得分:2)

您可以使用JavaScript。让我们将句子存储在数组中。

句子:

<p>The sun is <span>shining</span> today</p>
<p>Let's refactorate it</p>
<p>Nice. It's a <span>special day</span> of celebration</p>

JavaScript:

var sentences = [];

document.querySelectorAll('p span').forEach(function(span) {
    var sentencesText = span.parentNode.innerText.split('.');
    span.parentNode.innerHTML.split('.').forEach(function(sent, i) {
        if (sent.indexOf("<span>") != -1) {
            sentences.push(sentencesText[i]);
        }
    })
});

sentences数组的结果:

"The sun is shining today"
"It's a special day of celebration"

答案 2 :(得分:0)

使用split方法将句子分开,然后搜索哪个句子具有span

const p=document.getElementsByTagName('p')[0].innerHTML;
p.split(".").forEach(e=>{
  if(e.indexOf('span')>0){
    console.log(e);
  }
});
<p>The phospholipid heads face outward, one layer exposed to the interior of the cell and one layer exposed to the exterior. Because the <span>phosphate</span> groups are polar and hydrophilic, they are attracted to water in the intracellular fluid. Intracellular fluid (ICF) is the fluid interior of the cell.</p> 

答案 3 :(得分:0)

带有正则表达式的快速与肮脏解决方案。

请注意,此代码将需要一些额外的工作,以考虑到文本中的更多字符。这只是一个简单的示例,它使用您在问题中添加的基本文本并进行演示可以使用正则表达式来解决它。

const getSentences = () => {
  let paragraphs = document.querySelectorAll('p');
  let sentences = [];
  paragraphs.forEach((paragraph) => {
    paragraph = paragraph.innerHTML;
    sentences.push(paragraph.match(/(<p>)?\.?\s?[\w\d\s]+<span>(\w)+<\/span>\s?[\w\d\s,]{1,}\.\s?/ig));
  });
  return sentences;
};

getSentences().forEach((sentence) => {
  console.log(sentence);
});
p > span {
  background: #d2d2d2;
}
<!-- 1 <span> tag per <p> -->
<p>The phospholipid heads face outward, one layer exposed to the interior of the cell and one layer exposed to the exterior. 1 Because the <span>phosphate</span> groups are polar and hydrophilic, they are attracted to water in the intracellular fluid. Intracellular fluid (ICF) is the fluid interior of the cell.</p> 
<!-- End 1 <span> tag per <p> -->

<!-- Multiple <span> tags per <p> -->
<p>The phospholipid heads face outward, one layer exposed to the interior of the cell and one layer exposed to the exterior. 2 Because the <span>phosphate</span> groups are polar and hydrophilic, they are attracted to water in the intracellular fluid. Intracellular fluid (ICF) is the fluid interior of the cell. The phospholipid heads face outward, one layer exposed to the interior of the cell and one layer exposed to the exterior. 3 Because the <span>phosphate</span> groups are polar and hydrophilic, they are attracted to water in the intracellular fluid. The phospholipid heads face outward, one layer exposed to the interior of the cell and one layer exposed to the exterior.</p>
<!-- End Multiple <span> tags per <p> -->

<!-- 1 <span> tag per <p> at the beggining -->
<p>4 Because the <span>phosphate</span> groups are polar and hydrophilic, they are attracted to water in the intracellular fluid. Intracellular fluid (ICF) is the fluid interior of the cell.</p> 
<!-- End 1 <span> tag per <p> at the beggining -->

<!-- 1 <span> tag per <p> at the end -->
<p>Intracellular fluid (ICF) is the fluid interior of the cell. 5 Because the <span>phosphate</span> groups are polar and hydrophilic, they are attracted to water in the intracellular fluid.</p> 
<!-- End 1 <span> tag per <p> at the end -->