我正在使用PHP中的内置XMLReader从外部xml提要中读取数据。当我尝试读取以换行开头的提要时,出现以下错误:
ErrorException: XMLReader::read(): http://example.com/feeds/feed1.xml:2: parser error : XML declaration allowed only at the start of the document
我认为这是因为Feed换行了,但是我不知道如何解决该问题?如果包含换行符,该如何跳过第一行?
我似乎找不到任何人如何解决了这个问题。他们有一些使用SimpleXMLElement的解决方法,但是我无法将整个文档加载到内存中。
这是我的代码:
$reader = new XMLReader;
$reader->open($linkToExternalFeed);
while ($reader->read() && $reader->name != 'item');
while ($reader->name == 'item')
{
$node = new SimpleXMLElement($reader->readOuterXML());
$this->doSomeParsing($node);
unset($node);
$reader->next($reader->name);
}
$reader->close();
答案 0 :(得分:2)
您可以编写一个用于包装流的流包装器。找到第一个非空白后,它将删除过滤器,并开始将数据传递到XMLWriter。
class ResourceWrapper {
private $_stream;
private $_filter;
private $context;
public static function createContext(
$stream, callable $filter = NULL, string $protocol = 'myproject-resource'
): array {
self::register($protocol);
return [
$protocol.'://context',
\stream_context_create(
[
$protocol => [
'stream' => $stream,
'filter' => $filter
]
]
)
];
}
private static function register($protocol) {
if (!\in_array($protocol, \stream_get_wrappers(), TRUE)) {
\stream_wrapper_register($protocol, __CLASS__);
}
}
public function removeFilter() {
$this->_filter = NULL;
}
public function url_stat(string $path , int $flags): array {
return [];
}
public function stream_open(
string $path, string $mode, int $options, &$opened_path
): bool {
list($protocol, $id) = \explode('://', $path);
$context = \stream_context_get_options($this->context);
if (
isset($context[$protocol]['stream']) &&
\is_resource($context[$protocol]['stream'])
) {
$this->_stream = $context[$protocol]['stream'];
$this->_filter = $context[$protocol]['filter'];
return TRUE;
}
return FALSE;
}
public function stream_read(int $count) {
if (NULL !== $this->_filter) {
$filter = $this->_filter;
return $filter(\fread($this->_stream, $count), $this);
}
return \fread($this->_stream, $count);
}
public function stream_eof(): bool {
return \feof($this->_stream);
}
}
用法:
$xml = <<<'XML'
<?xml version="1.0"?>
<person><name>Alice</name></person>
XML;
// open the example XML string as a file stream
$resource = fopen('data://text/plain;base64,'.base64_encode($xml), 'rb');
$reader = new \XMLReader();
// create context for the stream and the filter
list($uri, $context) = \ResourceWrapper::createContext(
$resource,
function($data, \ResourceWrapper $wrapper) {
// check for content after removing leading white space
if (ltrim($data) !== '') {
// found content, remove filter
$wrapper->removeFilter();
// return data without leading whitespace
return ltrim($data);
}
return '';
}
);
libxml_set_streams_context($context);
$reader->open($uri);
while ($foundNode = $reader->read()) {
var_dump($reader->localName);
}
输出:
string(6) "person"
string(4) "name"
string(5) "#text"
string(4) "name"
string(6) "person"
答案 1 :(得分:0)
不太理想,但这只会读取源和内容的第一部分ltrim()
并将其写入临时文件,然后您应该能够读取名为$tmpFile
的文件。
$tmpFile = tempnam(".", "trx");
$fpIn = fopen($linkToExternalFeed,"r");
$fpOut = fopen($tmpFile, "w");
$buffer = fread($fpIn, 4096);
fwrite($fpOut, ltrim($buffer));
while ( $buffer = fread($fpIn, 4096)) {
fwrite($fpOut, $buffer);
}
fclose($fpIn);
fclose($fpOut);
我使用tmpname()
生成一个唯一的文件名,您可以将其设置为任何您喜欢的名称。处理该文件以节省空间并删除潜在的敏感信息后,删除该文件也可能很有用。