我正在编写一个程序,从多个外部来源获取经济和社会统计数据,并在数据库中进行采集(用于数据分析)。一些数据以XML格式提供,为了解析它,我需要识别XML文件中的元素/标签以及属性。识别我尝试过使用getAttribute的属性。
问题:虽然getElementsByTagName有效,但getAttribute并不适用。试图检索属性'索引'的值。从一个单元格返回""即使属性指数'确实存在于许多细胞元素中。没有错误,只返回没有值。
我花了很多天阅读PHP手册并研究互联网,试图找到解决方案,但没有成功。 对getAttribute的返回值进行回显或var_dump显示它始终返回""。 我没有放入完整的源代码,而是复制了一个更简单的版本,读取下面的XML文件会遇到无法返回属性的问题(在这种情况下是' Index'属性)。
<?php
// Creates new DOMDocument
$dom = new DOMDocument();
// Loads XML file into DOMDocument
$dom->load('FRED_formatted_list.xml');
// Stores all the instances of the Row tag into $rows
$rows = $dom->getElementsByTagName('Row');
// Iterates through all the instances of the Row tag
foreach($rows as $row) {
// Stores all the instances of the Cell tag into $cells
$cells = $row->getElementsByTagName('Cell');
// Iterates through all the instances of the Cell tag
foreach($cells as $cell) {
// Checks if the Index attribute exists in the cell tag
if($cell->hasAttribute('Index')) {
// Stores the value of any instances of the Index attribute
$attr = $cell->getAttribute('Index');
// Prints the value of any instances of the Index attribute to screen
echo "Value of index attribute: " . $attr . "<br>";
}
// Check that the cell tags have been properly identified in the DOM Object
echo $cell->nodeValue . "<br>";
// Double checks whether any index values are even found and stored in $attr
var_dump($attr) . "<br>";
}
}
?>
以下是XML文件的示例,其中显示了属性&#39; Index&#39;即使getAttributes没有返回,它确实存在:
<Row>
<Cell><Data ss:Type="String">AAA</Data></Cell>
<Cell ss:Index="3"><Data ss:Type="String">Board of Governors of the Federal Reserve System (US)</Data></Cell>
<Cell><Data ss:Type="String">H.15 Selected Interest Rates</Data></Cell>
<Cell><Data ss:Type="String">Percent</Data></Cell>
<Cell><Data ss:Type="String">Not Seasonally Adjusted</Data></Cell>
<Cell><Data ss:Type="String">The Federal Reserve Board has discontinued this series as of October 11, 2016. More information, including possible alternative series, can be found at http://www.federalreserve.gov/feeds/h15.html. </Data></Cell>
</Row>
任何帮助将不胜感激。我将总结解决方案并重新发布以帮助其他人。
答案 0 :(得分:0)
在xml中定义名称空间:
<Row xmlns:ss="something">
<Cell><Data ss:Type="String">AAA</Data></Cell>
<Cell ss:Index="3"><Data ss:Type="String">Board of Governors of the Federal Reserve System (US)</Data></Cell>
<Cell><Data ss:Type="String">H.15 Selected Interest Rates</Data></Cell>
<Cell><Data ss:Type="String">Percent</Data></Cell>
<Cell><Data ss:Type="String">Not Seasonally Adjusted</Data></Cell>
<Cell><Data ss:Type="String">The Federal Reserve Board has discontinued this series as of October 11, 2016. More information, including possible alternative series, can be found at http://www.federalreserve.gov/feeds/h15.html. </Data></Cell>
</Row>
尝试使用以下代码获取带有命名空间的属性值:
<?php
// Creates new DOMDocument
$dom = new DOMDocument();
// Loads XML file into DOMDocument
$dom->load('FRED_formatted_list.xml');
// Stores all the instances of the Row tag into $rows
$rows = $dom->getElementsByTagName('Row');
$attr ='';
// Iterates through all the instances of the Row tag
foreach($rows as $row) {
// Stores all the instances of the Cell tag into $cells
$cells = $row->getElementsByTagName('Cell');
// Iterates through all the instances of the Cell tag
foreach($cells as $cell) {
// Checks if the Index attribute exists in the cell tag
if($cell->attributes->getNamedItem('Index')) {
// Stores the value of any instances of the Index attribute
$attr = $cell->attributes->getNamedItem('Index')->nodeValue;
// Prints the value of any instances of the Index attribute to screen
echo "Value of index attribute: " . $attr . "<br>";
}
// Check that the cell tags have been properly identified in the DOM Object
echo $cell->nodeValue . "<br>";
// Double checks whether any index values are even found and stored in $attr
var_dump($attr) . "<br>";
}
}
答案 1 :(得分:0)
经过进一步的研究后,我发现其他人遇到了这个问题,并设法解决了这个问题。属性&#39;索引&#39;在XML单元格标签/元素中使用&s; ss预先修复:&#39; (根据<Cell ss:Index="3"><Data ss:Type="String">
上面的XML文件摘录)。为了让getAttribute能够解决问题:&#39; ss:&#39;需要包括在内正确的代码将是getAttribute('ss:Index')
而不是
getAttribute('Index')
我不完全理解getAttribute
如何识别某个属性,但可能是搜索一串前面带有空格的连续字符,因此&#39; ss:&#39;需要包括在内。