我有一个像这样的字符串
字符串文字=
<p><span><span id="test">Meanwhile, the Cougars are coming off of a win against Eastern
Washington University in which they scored 88 points and had three players score at least
15 points. <span>Motum</span> recorded his fourth career double-double in the game as well.
</span></span></p>
<p><span>After Dexter Kernich-Drew, Royce Woolridge, and Will DiIorio were unable to
practice last Wednesday before the game against EWU, the team is healthy and ready to play
against Utah Valley. </span></p>
<p><span><span><span>Woolridge</span>, a <span>redshirt</span> sophomore transfer who has
started at guard in the first two games this season, scored seven points and had two assists
against EWU. He also had 10 points and three assists against Saint Martin’s. </span>
</span></p>
我需要摆脱所有没有属性的东西,只是包含内容。我到目前为止的模式是
text = Regex.Replace(text, @"</?span([^>]*|/)?>", "", RegexOptions.Compiled);
只是拉开所有跨度
<p>Meanwhile, the Cougars are coming off of a win against Eastern Washington University
in which they scored 88 points and had three players score at least 15 points. Motum
recorded his fourth career double-double in the game as well. </p>
<p>After Dexter Kernich-Drew, Royce Woolridge, and Will DiIorio were unable to practice
last Wednesday before the game against EWU, the team is healthy and ready to play
against Utah Valley. </p>
<p>Woolridge, a redshirt sophomore transfer who has started at guard in the first
two games this season, scored seven points and had two assists against EWU. He also had
10 points and three assists against Saint Martin’s. </p>
这很接近,但我需要第一个
,其中包含
<p><span id="test">Meanwhile, the Cougars are coming off of a win against Eastern
Washington University in which they scored 88 points and had three players score at
least 15 points. Motum recorded his fourth career double-double in the game as well.
</span></p>
这里的问题是如何找到没有属性的嵌套跨度并将其删除。我确实有一些其他尝试使用返回跟踪作为结束标记,但这是唯一一个最接近的标记。
答案 0 :(得分:0)
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var spans = doc.DocumentNode.SelectNodes("//span[@*]")
.Select(s => s.InnerText)
.ToList();
答案 1 :(得分:0)
这是一个简单算法的伪代码:
create a stack of booleans
set the last position to the start of the text
search for the opening and the closing spans and for each one found:
append the text since the last position up to the start of the found item to the output
if the found item is an opening span:
if the found item has attributes:
// it's an opening span with attributes
// we want to keep it
push true onto the stack
append the item to the output
else:
// it's an opening span without attributes
// we want to drop it
push false onto the stack
else:
pop the top boolean from the stack
if the popped boolean is true:
// the corresponding opening span had attributes
// we want to keep this closing span
append the found item to the output
set the last position to the end of the found item
append the remaining text since the last position to the output