使用PHP

时间:2015-10-11 20:26:12

标签: php xml rdf

我正在尝试获取属性&rdf:resource'价值来自&#; rdf:li'此XML中的元素:http://www.ecb.europa.eu/rss/fxref-usd.html

实现这一目标的正确方法是什么?如何才能正确解析这些RDF元素?

这是我到目前为止所做的:

<!DOCTYPE html>
<html>

    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>RDF</title>
    </head>

    <body>

    <ul> 
 <?php      

            $rdf = file_get_contents('http://www.ecb.europa.eu/rss/fxref-usd.html');


            $rdf = str_replace('rdf:', 'rdf_', $rdf);


            $xml = simplexml_load_string($rdf);


            foreach ($xml->channel->items->rdf_Seq->rdf_li as $item) {
                $attributes = $item->attributes();              

                if(isset($attributes['rdf_resource'])) {
                    echo '<li><a href ='.$attributes['rdf_resource'].' target="_blank">'.$attributes['rdf_resource'].'</a> <l/i>';
                }
            }
?>
    </ul>

    </body>

</html>

正如你所看到的,这是一种黑客攻击,我认为这不是正确的方法。

感谢任何帮助!

1 个答案:

答案 0 :(得分:1)

  

我正在尝试获取属性&rdf:resource&#39;价值来自&#; rdf:li&#39;此XML中的元素:http://www.ecb.europa.eu/rss/fxref-usd.html

首先,这实际上不是合法的RDF,至少根据Jena的解析器。在删除了明显不允许在rdf:RDF元素上使用的xsd架构位置之后,我仍然收到错误:期望XML开始或结束元素。字符串数据&#34; U2&#34;不允许。也许应该有一个rdf:parseType =&#39; Literal&#39;用于在RDF中嵌入混合XML内容。可能是条纹错误。

但即使它是合法的RDF / XML,你的方法也存在两个问题,最终会变得脆弱。首先,使用XML工具可靠地处理RDF / XML非常困难,正如我在this answer写的How to access OWL documents using XPath in Java?中所解释的那样。通常,相同的RDF图可以序列化为一堆不同的RDF / XML文档。对于使用rdf:li,这一点尤为重要:即使XML文档中有rdf:li元素,RDF图实际上也没有任何具有rdf:li属性的资源。看看:

  

2.15 Container Membership Property Elements: rdf:li and rdf:_n

     

RDF具有一组容器成员资格属性并且相应   属性元素,主要用于rdf的实例:Seq,   rdf:Bag和rdf:可以写为类型节点的Alt类   元素。列表属性是rdf:_1,rdf:_2等,可以   写为属性元素或属性属性,如   例17.有一个rdf:li特殊属性元素   等效于rdf:_1,rdf:_2,在章节中有详细解释   7.4。到容器成员资格属性的映射始终按照rdf:li特殊属性元素出现在XML中的顺序完成 -   文件顺序很重要。等效的RDF / XML到Example   在实施例18中显示了以这种形式书写的17。

这意味着RDF / XML片段(不太合法,但给人一般印象)如:

<ex:Collection>
  <rdf:li rdf:about="member1"/>
  <rdf:li rdf:about="member2"/>
</ex:Collection>

也可以写成:

<ex:Collection>
  <rdf:_2 rdf:about="member2"/>
  <rdf:_1 rdf:about="member1"/>
</ex:Collection>

这意味着这里任何纯粹基于XML的方法都可能会变得脆弱,因为它将取决于某些不能保证始终以相同方式表示的结构。

通常答案是使用RDF查询语言进行查询,以便您可以在RDF级别进行查询。标准RDF查询语言是SPARQL。不幸的是,由于存在无限多的属性(rdf:_1,rdf:_2,...),因此在SPARQL中很难有效地执行此操作,因为您最终需要匹配看起来像rdf的URI:_xxx然后弄清楚那个下划线后面会发生什么。

好的,所以如果你能将RDF / XML变成合法的格式,你可能会得到类似的东西:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:cb="http://www.cbwiki.net/wiki/index.php/Specification_1.1" xmlns:dc = "http://purl.org/dc/elements/1.1/" xmlns:dcterms = "http://purl.org/dc/terms/" xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance">
<channel  rdf:about = "http://www.ecb.europa.eu/rss/usd.html">
<title>ECB | US dollar (USD) - Euro foreign exchange reference rates</title>  
<link>http://www.ecb.europa.eu/home/html/rss.en.html</link>
<description>The reference rates are based on the regular daily concertation procedure between central banks within and outside the European System of Central Banks, which normally takes place at 2.15 p.m. (14:15) ECB time.</description>
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-09&amp;rate=1.1362" />
<rdf:li rdf:resource="http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-08&amp;rate=1.1254" />
<rdf:li rdf:resource="http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-07&amp;rate=1.1266" />
<rdf:li rdf:resource="http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-06&amp;rate=1.1224" />
<rdf:li rdf:resource="http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-05&amp;rate=1.1236" />
</rdf:Seq>
</items>
</channel>
</rdf:RDF>

现在,请记住,那些rdf:li XML元素并不意味着图中有rdf:li属性,而是有一堆rdf:_n属性。在Turtle序列化(类似于SPARQL语法)中,数据是:

@prefix :      <http://purl.org/rss/1.0/> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix cb:    <http://www.cbwiki.net/wiki/index.php/Specification_1.1> .
@prefix dc:    <http://purl.org/dc/elements/1.1/> .
@prefix xsi:   <http://www.w3.org/2001/XMLSchema-instance> .

<http://www.ecb.europa.eu/rss/usd.html>
        a             :channel ;
        :description  "The reference rates are based on the regular daily concertation procedure between central banks within and outside the European System of Central Banks, which normally takes place at 2.15 p.m. (14:15) ECB time." ;
        :items        [ a       rdf:Seq ;
                        rdf:_1  <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-09&rate=1.1362> ;
                        rdf:_2  <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-08&rate=1.1254> ;
                        rdf:_3  <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-07&rate=1.1266> ;
                        rdf:_4  <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-06&rate=1.1224> ;
                        rdf:_5  <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-05&rate=1.1236>
                      ] ;
        :link         "http://www.ecb.europa.eu/home/html/rss.en.html" ;
        :title        "ECB | US dollar (USD) - Euro foreign exchange reference rates" .

我要做的是查找频道的:items 属性,检查它是否为rdf:Seq,然后获取除rdf之外的所有属性:type,并假设他们是rdf:_n值,或实际获取rdf:_xxx属性值。那看起来像是:

prefix :      <http://purl.org/rss/1.0/>
prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?item {
  <http://www.ecb.europa.eu/rss/usd.html> :items ?x .
  ?x a rdf:Seq .
  ?x ?p ?item .
  filter (?p != rdf:type)
}
--------------------------------------------------------------------------------------------------------------------
| item                                                                                                             |
====================================================================================================================
| <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-05&rate=1.1236> |
| <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-06&rate=1.1224> |
| <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-07&rate=1.1266> |
| <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-08&rate=1.1254> |
| <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-09&rate=1.1362> |
--------------------------------------------------------------------------------------------------------------------

或者,后一种方法(实际检查rdf:_):

prefix :      <http://purl.org/rss/1.0/>
prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix xsd:   <http://www.w3.org/2001/XMLSchema#>

select ?n ?item {
  <http://www.ecb.europa.eu/rss/usd.html> :items ?x .
  ?x a rdf:Seq .
  ?x ?p ?item .

  # check that ?p starts with rdf:_
  filter strstarts(str(?p),str(rdf:_))

  # and extract the part after rdf:_ and convert
  # it to an integer
  bind (xsd:integer(strafter(str(?p),str(rdf:_))) as ?n)
}
------------------------------------------------------------------------------------------------------------------------
| n | item                                                                                                             |
========================================================================================================================
| 5 | <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-05&rate=1.1236> |
| 4 | <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-06&rate=1.1224> |
| 3 | <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-07&rate=1.1266> |
| 2 | <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-08&rate=1.1254> |
| 1 | <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-09&rate=1.1362> |
------------------------------------------------------------------------------------------------------------------------

现在您只需要一个PHP的SPARQL库。我不是真正的PHP用户,所以我不推荐一个,但我知道有关PHP和SPARQL的Stack Overflow还有其他一些问题,并且有一些那里的图书馆。