XMLReader-仅在文档开头允许使用XML声明

时间:2018-11-08 19:42:25

标签: php xml xml-parsing xmlreader

我正在使用PHP中的内置XMLReader从外部xml提要中读取数据。当我尝试读取以换行开头的提要时,出现以下错误:

ErrorException: XMLReader::read(): http://example.com/feeds/feed1.xml:2: parser error : XML declaration allowed only at the start of the document

我认为这是因为Feed换行了,但是我不知道如何解决该问题?如果包含换行符,该如何跳过第一行?

我似乎找不到任何人如何解决了这个问题。他们有一些使用SimpleXMLElement的解决方法,但是我无法将整个文档加载到内存中。

这是我的代码:

$reader = new XMLReader;
$reader->open($linkToExternalFeed);

while ($reader->read() && $reader->name != 'item');

while ($reader->name == 'item')
{
    $node = new SimpleXMLElement($reader->readOuterXML());

    $this->doSomeParsing($node);

    unset($node);

    $reader->next($reader->name);
}

$reader->close();

2 个答案:

答案 0 :(得分:2)

您可以编写一个用于包装流的流包装器。找到第一个非空白后,它将删除过滤器,并开始将数据传递到XMLWriter。

class ResourceWrapper {

    private $_stream;

    private $_filter;

    private $context;

    public static function createContext(
        $stream, callable $filter = NULL, string $protocol = 'myproject-resource'
    ): array {
        self::register($protocol);
        return [
            $protocol.'://context', 
            \stream_context_create(
                [
                    $protocol => [
                        'stream' => $stream,
                        'filter' => $filter
                    ]
                ]
            )
        ];
    }

    private static function register($protocol) {
        if (!\in_array($protocol, \stream_get_wrappers(), TRUE)) {
            \stream_wrapper_register($protocol, __CLASS__);
        }
    }

    public function removeFilter() {
        $this->_filter = NULL;
    }

    public function url_stat(string $path , int $flags): array {
        return [];
    }

    public function stream_open(
        string $path, string $mode, int $options, &$opened_path
    ): bool {
        list($protocol, $id) = \explode('://', $path);
        $context = \stream_context_get_options($this->context);
        if (
            isset($context[$protocol]['stream']) &&
            \is_resource($context[$protocol]['stream'])
        ) {
            $this->_stream = $context[$protocol]['stream'];
            $this->_filter = $context[$protocol]['filter'];
            return TRUE;
        }
        return FALSE;
    }

    public function stream_read(int $count) {
        if (NULL !== $this->_filter) {
            $filter = $this->_filter;
            return $filter(\fread($this->_stream, $count), $this);
        }
        return \fread($this->_stream, $count);
    }

    public function stream_eof(): bool {
        return \feof($this->_stream);
    }
}

用法:

$xml = <<<'XML'


<?xml version="1.0"?>
<person><name>Alice</name></person>
XML;

// open the example XML string as a file stream
$resource = fopen('data://text/plain;base64,'.base64_encode($xml), 'rb');

$reader = new \XMLReader();
// create context for the stream and the filter
list($uri, $context) = \ResourceWrapper::createContext(
    $resource,
    function($data, \ResourceWrapper $wrapper) {
        // check for content after removing leading white space
        if (ltrim($data) !== '') {
            // found content, remove filter
            $wrapper->removeFilter();
            // return data without leading whitespace
            return ltrim($data);
        }
        return '';
    }
);
libxml_set_streams_context($context);
$reader->open($uri);

while ($foundNode = $reader->read()) {
    var_dump($reader->localName);
}

输出:

string(6) "person" 
string(4) "name" 
string(5) "#text" 
string(4) "name" 
string(6) "person"

答案 1 :(得分:0)

不太理想,但这只会读取源和内容的第一部分ltrim()并将其写入临时文件,然后您应该能够读取名为$tmpFile的文件。

$tmpFile = tempnam(".", "trx");
$fpIn = fopen($linkToExternalFeed,"r");
$fpOut = fopen($tmpFile, "w");
$buffer = fread($fpIn, 4096);
fwrite($fpOut, ltrim($buffer));
while ( $buffer = fread($fpIn, 4096))    {
    fwrite($fpOut, $buffer);
}
fclose($fpIn);
fclose($fpOut);

我使用tmpname()生成一个唯一的文件名,您可以将其设置为任何您喜欢的名称。处理该文件以节省空间并删除潜在的敏感信息后,删除该文件也可能很有用。