':'字符,十六进制值0x3A,不能包含在名称中

时间:2012-04-25 11:47:17

标签: .net-4.0 xml-parsing linq-to-xml

我已经看到了这个问题,但我没有看到答案..

所以我收到了这个错误:

The ':' character, hexadecimal value 0x3A, cannot be included in a name.

关于此代码:

    XDocument XMLFeed = XDocument.Load("http://feeds.foxnews.com/foxnews/most-popular?format=xml");
    XNamespace content = "http://purl.org/rss/1.0/modules/content/";

    var feeds = from feed in XMLFeed.Descendants("item")
        select new
        {
            Title = feed.Element("title").Value,
            Link = feed.Element("link").Value,
            pubDate = feed.Element("pubDate").Value,
            Description = feed.Element("description").Value,
            MediaContent = feed.Element(content + "encoded")
        };

    foreach (var f in feeds.Reverse())
    {
        ....
    }

项目如下:

<rss>    
<channel>

....items....

<item>
<title>Pentagon confirms plan to create new spy agency</title>
<link>http://feeds.foxnews.com/~r/foxnews/most-popular/~3/lVUZwCdjVsc/</link>
<category>politics</category>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/" />
<pubDate>Tue, 24 Apr 2012 12:44:51 PDT</pubDate>
<guid isPermaLink="false">http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</guid>
<content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[|http://global.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg<img src="http://feeds.feedburner.com/~r/foxnews/most-popular/~4/lVUZwCdjVsc" height="1" width="1"/>]]></content:encoded>
<description>The Pentagon confirmed Tuesday that it is carving out a brand new spy agency expected to include several hundred officers focused on intelligence gathering around the world.&amp;amp;#160;</description>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2012-04-4T19:44:51Z</dc:date>
<feedburner:origLink>http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</feedburner:origLink>
</item>

....items....

</channel>
</rss>    

我想要的只是获得&#34; http://global.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg" ;,然后检查内容:编码是否存在..

感谢。

修改 我找到了一个示例,我可以显示并编辑试图处理它的代码..

EDIT2: 我以丑陋的方式做到了:

text.Replace("content:encoded", "contentt").Replace("xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"","");

然后以正常方式获取元素:

MediaContent = feed.Element("contentt").Value

2 个答案:

答案 0 :(得分:0)

您应该使用XNamespace:

XNamespace content = "...";

// later in your code ...
MediaContent = feed.Element(content + "encoded")

查看更多详情here

(当然,您要分配给内容的字符串与xmlns:content="..."中的字符串相同。)

答案 1 :(得分:0)

以下代码

    static void Main(string[] args)
    {

            var XMLFeed = XDocument.Parse(
@"<rss>    
<channel>

....items....

<item>
<title>Pentagon confirms plan to create new spy agency</title>
<link>http://feeds.foxnews.com/~r/foxnews/most-popular/~3/lVUZwCdjVsc/</link>
<category>politics</category>
<dc:creator xmlns:dc='http://purl.org/dc/elements/1.1/' />
<pubDate>Tue, 24 Apr 2012 12:44:51 PDT</pubDate>
<guid isPermaLink='false'>http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</guid>
<content:encoded xmlns:content='http://purl.org/rss/1.0/modules/content/'><![CDATA[|http://global.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg<img src='http://feeds.feedburner.com/~r/foxnews/most-popular/~4/lVUZwCdjVsc' height='1' width='1'/>]]></content:encoded>
<description>The Pentagon confirmed Tuesday that it is carving out a brand new spy agency expected to include several hundred officers focused on intelligence gathering around the world.&amp;amp;#160;</description>
<dc:date xmlns:dc='http://purl.org/dc/elements/1.1/'>2012-04-4T19:44:51Z</dc:date>
<!-- <feedburner:origLink>http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</feedburner:origLink> -->
</item>

....items....

</channel>
</rss>");
            XNamespace contentNs = "http://purl.org/rss/1.0/modules/content/";
            var feeds = from feed in XMLFeed.Descendants("item")
                        select new
                                   {
                                       Title = (string)feed.Element("title"),
                                       Link = (string)feed.Element("link"),
                                       pubDate = (string)feed.Element("pubDate"),
                                       Description = (string)feed.Element("description"),
                                       MediaContent = GetMediaContent((string)feed.Element(contentNs + "encoded"))
                                   };
            foreach(var item in feeds)
            {
                Console.WriteLine(item);
            }
        }

        private static string GetMediaContent(string content)
        {
            int imgStartPos = content.IndexOf("<img");
            if(imgStartPos > 0)
            {
                int startPos = content[0] == '|' ? 1 : 0;

                return content.Substring(startPos, imgStartPos - startPos);
            }

            return string.Empty;
        }

结果:

{ Title = Pentagon confirms plan to create new spy agency, Link = http://feeds.f
oxnews.com/~r/foxnews/most-popular/~3/lVUZwCdjVsc/, pubDate = Tue, 24 Apr 2012 1
2:44:51 PDT, Description = The Pentagon confirmed Tuesday that it is carving out
 a brand new spy agency expected to include several hundred officers focused on
intelligence gathering around the world.&#160;, MediaContent = http://global
.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg }
Press any key to continue . . .

几点:

  • 您永远不想将Xml视为文本 - 在您的情况下,您删除了命名空间声明,但实际上如果命名空间是内联声明(即没有绑定到前缀)或者将定义不同的前缀,您的代码将无法正常工作在语义上两个文件都是等价的
  • 除非你知道CDATA内部的内容以及如何对待它,否则你总是希望将其视为文本。如果您知道它是其他的东西,您可以在解析后以不同的方式对待它 - 请参阅下面的CDATA详细信息以获取更多详细信息
  • 如果元素丢失,为了避免NullReferenceExceptions,我使用显式转换运算符(字符串)而不是调用.Value
  • 您发布的Xml不是有效的xml - 缺少命名空间Uri for feedburner prefix

这已不再与问题相关,但可能对某些人有帮助,所以我要离开

考虑到编码元素的内容,它在CDATA部分内。 CDATA部分内部不是Xml而是纯文本。 CDATA通常用于不必编码'&lt;','&gt;','&amp;'字符(没有CDATA,它们必须编码为&lt;&gt;和&amp;不破坏Xml文档本身),但Xml处理器将CDATA中的字符视为编码(或编码它们更正确) 。如果你想嵌入html,CDATA很方便,因为在文本上嵌入的内容看起来像原始的,但如果html不是格式良好的Xml,它不会破坏你的xml。由于CDATA内容不是Xml,而是文本,因此无法将其视为Xml。您可能需要将其视为文本并使用例如正则表达式。如果您知道它是有效的Xml,您可以再次将内容加载到XElement并进行处理。在你的情况下,你有混合的内容,所以除非你使用一些脏的黑客,这是不容易做到的。如果您只有一个顶级元素而不是混合内容,那么一切都会很简单。黑客是添加元素,以避免所有的麻烦。在foreach看来你可以做这样的事情:

var mediaContentXml = XElement.Parse("<content>" + (string)item.MediaContent + "</content>");
Console.WriteLine((string)mediaContentXml.Element("img").Attribute("src"));    

同样它不漂亮而且它是一个黑客但如果编码元素的内容是有效的Xml它将起作用。更正确的方法是将我们的XmlReader与ConformanceLevel设置为Fragment并适当地识别所有类型的节点以创建相应的Linq到Xml节点。