React-使用Regex突出显示危险地设置内部HTML中的文本。无法可靠地工作

时间:2018-08-14 20:24:39

标签: regex reactjs replace highlight

目标是突出显示危险的SetInnerHTML内部的文本部分(字符串)。因此,我尝试在html内匹配所需的文本部分,并以适当的样式将其包装在“ span”中。我正在使用以下代码,这些代码可完美地用于某些文本(html),但对于某些文本则完全没有。请在下面找到一个有效的示例。尝试花费几个小时来了解差异,或者正则表达式为什么不起作用...但是我无法弄清楚。 head头撞墙。

我的问题是:为什么正则表达式在某些情况下会失败而在另一些情况下会起作用?即使在所有情况下,文本(“ quote”)都在那里。

有什么想法我想念的吗?非常感谢您的帮助!

突出显示组件JSX:

import React from "react";



class HighlightQuote extends React.Component {
  render = () => {

    //zitat is for getting rid of any quotation marks in the beginning or end.
    var zitat = this.props.quotes.map(x => x.replace(/^[“”"’()]+|[“”"’()]+$/g, ""));

    if (this.props.quotes.length === 0) {
      var highlightedHtml = this.props.newcontent

    }
    else {
      var zitat = this.props.quotes.map(x => x.replace(/^[“”"’()]+|[“”"’()]+$/g, ""));
      const regex = new RegExp(`(${zitat.join('|')})`, 'g');
      var highlightedHtml = this.props.content.replace(
          regex,
          '<span class="hl">$1</span>'
        );
       console.log ('highlightedHtml:');
       console.log (highlightedHtml);
    }


    return (
        <div className="reader" ref="test" dangerouslySetInnerHTML={{ __html: highlightedHtml }} />

    );
  };
}

export default HighlightQuote;

工作示例(console.log(突出显示的html)

<div class="post" id="post-17660">
<p class="postcontents">
<article> <div class="post-inside">
<p>One of the things I have disliked the most about the crypto sector is the idea that people should &#x201C;hodl&#x201D; or &#x201C;hold on for dear life.&#x201D;</p>
<p>I have written many times here at AVC that one should take profits when they are available and diversify an investment portfolio.</p>
<p><span class="hl">The idea that an investor should hold on no matter what has always seemed ridiculous to me.</span></p>
<p>Now, the crypto markets are in the eighth month of a long and painful bear market and we are starting to see some signs of capitulation, particularly in the assets that went up the most last year.</p>
<p>Whether this is the long-awaited&#xA0;capitulation of the HODL crowd or not, I can&#x2019;t say.</p>
<p>But capitulation would be a good thing for the crypto markets, releasing assets into the market that until now have been locked up by long-term&#xA0;holders.</p>
<p><span class="hl">Until then it is hard to get excited about buying anything in crypto.</span></p>
</div> </article>
</p> </div>

突出显示的行情:

"The idea that an investor should hold on no matter what has always seemed ridiculous to me."

"Until then it is hard to get excited about buying anything in crypto."

失败示例(console.log(突出显示的html)

<div><article id="story" class="Story-story--2QyGh css-1j0ipd9"><header class="css-1qcpy3f e345g291"><p class="css-1789nl8 etcg8100"><a class="css-1g7m0tk" href="https://www.nytimes.com/column/new-sentences">New Sentences</a></p><div class="css-30n6iy e345g290"><div class="css-acwcvw"></div></div><figure class="ResponsiveMedia-media--32g1o ResponsiveMedia-sizeSmall--3092U ResponsiveMedia-layoutVertical--1pg1o ResponsiveMedia-sizeSmallNoCaption--n--T0 css-1hzd7ei"><figcaption class="css-pplcdj ResponsiveMedia-caption--1dUVu"></figcaption></figure></header><div class="css-18sbwfn StoryBodyCompanionColumn"><div class="css-1h6whtw"><p class="css-1i0edl6 e2kc3sl0"><em class="css-2fg4z9 ehxkw330">&#x2014; From Keith Gessen&#x2019;s second novel, &#x201C;A Terrible Country&#x201D; (Viking, 2018, Page 4). Gessen is also the author of &#x201C;All the Sad Young Literary Men&#x201D; and a founding editor of the journal n+1.</em></p><p class="css-1i0edl6 e2kc3sl0">All authors have signature sentence structures &#x2014; deep expressive grooves that their minds instinctively find and follow. (That previous sentence is one of mine: a simple declaration that leaps, after the break of a long dash, into an elaborate restatement.)</p><p class="css-1i0edl6 e2kc3sl0">Here is one of Keith Gessen&#x2019;s:</p><p class="css-1i0edl6 e2kc3sl0">&#x201C;As for me, I wasn&#x2019;t really an idiot. But neither was I not an idiot.&#x201D;</p><p class="css-1i0edl6 e2kc3sl0">&#x201C;I hadn&#x2019;t been yelling, I didn&#x2019;t think. But I hadn&#x2019;t not been yelling either.&#x201D;</p><p class="css-1i0edl6 e2kc3sl0">&#x201C;Cute cafes were not the problem, but they were also not, as I&#x2019;d once apparently thought, the opposite of the problem.&#x201D;</p></div><aside class="css-14jsv4e"><span></span></aside></div><div class="css-18sbwfn StoryBodyCompanionColumn"><div class="css-1h6whtw"><p class="css-1i0edl6 e2kc3sl0">Sentence structures are not simply sentence structures, of course &#x2014; they are miniature philosophies. Hemingway, with his blunt verbal bullets, is making a huge claim about the nature of the world. So is James Joyce, with his collages and frippery. So are Nikki Giovanni and Samuel Delany and Ursula K. Le Guin and John McPhee and Missy Elliott and Dr. Seuss and anyone else who converts thoughts into prose.</p><p class="css-1i0edl6 e2kc3sl0">Likewise, Keith Gessen&#x2019;s signature sentence structure &#x2014; &#x201C;not X, but also not not X&#x201D; &#x2014; suggests an entire worldview. It is a universe of in-betweenness, in which the most basic facts of life, the things we absolutely expect to understand, spill and scatter like toast crumbs into the gaps between the floorboards. It is a world of embarrassingly trivial category errors. The sentences above come from Gessen&#x2019;s new novel, &#x201C;A Terrible Country,&#x201D; the story of a 30-something American man who goes to Russia to care for his elderly grandmother. He falls into the gaps between huge concepts: youth and age, purpose and purposelessness, progress and stasis. He is not Russian but also not not Russian, not smart but also not not smart, not heroic but also not not heroic. Such is the way of the world. No matter how much we try, none of us is ever only one thing. None of us is ever pure.</p></div><aside class="css-14jsv4e"><span></span></aside></div><div class="bottom-of-article"><div class="css-k8fkhk"><p>Sam Anderson is a staff writer for the magazine.</p> <p><i>Sign up for </i><a href="http://www.nytimes.com/newsletters/magazine"><i>our newsletter</i></a><i> to get the best of The New York Times Magazine delivered to your inbox every week.</i></p></div><div class="css-3glrhn">A version of this article appears in print on , on Page 11 of the Sunday Magazine with the headline: From Keith Gessen&#x2019;s &#x2018;A Terrible Country&#x2019;<span>. <a href="http://www.nytreprints.com/">Order Reprints</a> | <a href="http://www.nytimes.com/pages/todayspaper/index.html">Today&#x2019;s Paper</a> | <a href="https://www.nytimes.com/subscriptions/Multiproduct/lp8HYKU.html?campaignId=48JQY">Subscribe</a></span></div></div><span></span></article></div>

应突出显示的报价:

"Sentence structures are not simply sentence structures, of course — they are miniature philosophies"

1 个答案:

答案 0 :(得分:0)

正则表达式匹配失败的原因是html实体。 crazySetInnerHTML内部的某些已解析文本使用了实体引用。在上面的失败示例中,引号中包含一个“ —”字符,该字符在html中被解码为&#x2014;

为了摆脱html实体,我使用了“ he”库https://github.com/mathiasbynens/he,它是用JavaScript编写的健壮HTML实体编码器/解码器。

 var contentDecoded = he.decode(this.props.content);

 var highlightedHtml = contentDecoded.replace(
    regex,
    '<span class="annotator-hl">$1</span>'
 );