Question

以下是来自twitter atom feed的xml的子集：

<entry>
    <id>tag:search.twitter.com,2005:18232030105964545</id>
    <published>2010-12-24T09:10:29Z</published>
    <link type="text/html" rel="alternate" href="http://twitter.com/KTNKenya/statuses/18232030105964545"/>
    <title>Synovate Poll: PM Raila Odinga remains the preffered presidential candidate at 42% while Uhuru Kenyatta is at 14%... http://fb.me/yjmMbmBx</title>
    <content type="html">Synovate Poll: PM &lt;b&gt;Raila&lt;/b&gt; Odinga remains the preffered presidential candidate at 42% while Uhuru Kenyatta is at 14%... &lt;a href=&quot;http://fb.me/yjmMbmBx&quot;&gt;http://fb.me/yjmMbmBx&lt;/a&gt;</content>
    <updated>2010-12-24T09:10:29Z</updated>
    <link type="image/png" rel="image" href="http://a3.twimg.com/profile_images/701825859/NEW_KTN_normal.png"/>
    <google:location>nairobi, kenya</google:location>
    <twitter:geo>
    </twitter:geo>
    <twitter:metadata>
        <twitter:result_type>recent</twitter:result_type>
    </twitter:metadata>
    <twitter:source>&lt;a href=&quot;http://www.facebook.com/twitter&quot; rel=&quot;nofollow&quot;&gt;Facebook&lt;/a&gt;</twitter:source>
    <twitter:lang>en</twitter:lang>
    <author>
        <name>KTNKenya (KTN Kenya)</name>
        <uri>http://twitter.com/KTNKenya</uri>
    </author>
</entry>

从<title>...</title>元素，我需要通过XPath查询选择超链接 http://fb.me/yjmMbmBx 。我该怎么做？可能吗？ *我是XPath新手。

感谢。

Answer 1

您有两种选择：

使用＆lt; title＆gt; （xpath：“/ entry / title / text（）”）并自己获取URL（例如使用正则表达式或在字符串中查找“http：//”的最后一个实例。
首先获取数据：
```
/entry/content[@type="html"]/text()
```
然后，您需要将其解析为HTML并提取任何标记，并使用这些标记的href属性。最后一部分的执行方式取决于您在此处执行的语言/环境。

更新：根据要求为上面的选项1添加了基本示例代码：

xmlpp::Element *node = parser.get_document()->get_root_node();
xmlpp::NodeSet results = node->find("/entry/title/text()"); 
xmlpp::ContentNode* content = dynamic_cast<xmlpp::ContentNode*>(results.front());
std::string text = content->get_content();
std::string link = "";
int res = text.rfind("http://");
if(res == text.npos)
    res = text.rfind("https://");
if(res != text.npos)
    link = text.substr(res);

Answer 2

将atom前缀绑定到http://www.w3.org/2005/Atom命名空间URI，请使用：

/atom:feed/atom:entry/atom:title[contains(.,'http://')]

这将选择atom:title的每个atom:entry元素子元素，其字符串值包含字符串“http：//”。

XPath查询以选择超链接

2 个答案: