Question

我有一个标签，我正在尝试从中获取实际文本。

此标记的一个示例是:(并且所有格式都相同）

<description>
&lt;div class=&quot;field field-name-field-body-small field-type-text-long field-label-hidden&quot;&gt;The evolution of your League of Legends match history is now live!
&lt;/div&gt;
&lt;div class=&quot;field field-name-field-article-media field-type-file field-label-hidden&quot;&gt;
&lt;div id=&quot;file-13180&quot; class=&quot;file file-image file-image-jpeg&quot;&gt;
&lt;img typeof=&quot;foaf:Image&quot; src=&quot;/sites/default/files/styles/large/public/upload/mh_640x360.jpg?itok=z_Nn84Op&quot; width=&quot;480&quot; height=&quot;270&quot; alt=&quot;&quot; title=&quot;&quot; /&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;field field-name-custom-author field-type-entityreference field-label-hidden&quot;&gt;
&lt;span class=&quot;article_detail&quot;&gt;&lt;span class=&quot;posted_by&quot;&gt;By Riot MattEnth&lt;/span&gt;
&lt;/span&gt;&lt;/div&gt;
</description>

我想要第一行的文字，在这个例子中包含（最右边的代码片段）

The evolution of your League of Legends match history is now live!

使用以下代码有一种简单的方法吗？现在它返回整个垃圾串。

XDocument xmlFile = XDocument.Load(@"http://na.leagueoflegends.com/en/rss.xml");
var LoLdescriptions = from service in xmlFile.Descendants("item")
                     select (string)service.Element("description");
ViewBag.descriptions = LoLdescriptions.ToArray();

...moving into View...

@ViewBag.descriptions[0]

如果这不难，还有办法获得最后一行吗？在这种情况下By Riot MattEnth

谢谢！

供参考的XML代码：http://na.leagueoflegends.com/en/rss.xml

Answer 1

我不知道这是哪种语言。但在我看来，您需要先读取文件并转换所有HTML实体。您可以将真正的XML / HTML作为字符串传递给解析器。

不要使用正则表达式。尝试获取一些XPath-tree树，您可以从中选择元素内容（即文本）。

Answer 2

有趣的格式！你可以用这个：

(?<=<description>.*[\r\n]*.*?&quot;&gt;).*

在the demo上，您必须向右滚动才能看到匹配。

<强>解释

lookbehind (?<=<description>.*[\r\n]*.*?">)断言当前位置前面的是<description>，然后.*[\r\n]*行到行尾的任何字符，然后是新行字符，然后是">之前的任何字符。 {1}}
.*将所有内容匹配到行尾

在C＃中，您可以像这样检索匹配：

var myRegex = new Regex(@"(?<=<description>.*[\r\n]*.*?&quot;&gt;).*");
string resultString = myRegex.Match(yourString).Value;

正则表达式，试图在XML标签内获取文本

2 个答案: