我正在处理这种XML序列文件,你可以建议我解析这个:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]>
<name>ccccc</name>
<document-id>
<country>US</country>
<doc-number>D0629997</doc-number>
<kind>S1</kind>
<date>20110104</date>
</document-id>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]>
<name>dddd</name>
<document-id>
<country>US</country>
<doc-number>D0629998</doc-number>
<kind>S2</kind>
<date>20110104</date>
</document-id>
答案 0 :(得分:1)
这不是一个有效的XML文件。它看起来像一个两个文件,但即便如此它也是无效的。假设这些是两个单独的文件,您可以先尝试“整理”它们。假设$ xml是一个包含xml内容的字符串:
$xml = tidy_repair_string($xml, array(
'output-xml' => true,
'input-xml' => true
));
然后你就可以使用SimpleXml了:
$xml = new SimpleXmlElement($xml);
答案 1 :(得分:0)
我知道这个XML文件来自哪里,我发现很奇怪Google会提供一些无效的XML(除非他们只是托管他们从其他地方获得的这个文件)。这个解析它的建议对我有用:How to parse an xml file with multiple xml declaration using PHP? (A concatenation of several XML files)
答案 2 :(得分:0)
该文件包含一系列彼此连接的XML文档。您需要注册一个透明地为您分割文件的PHP streamwrapper,然后您可以单独处理每个文档,甚至以流方式处理。例如:
stream_wrapper_register('xmlseq', 'XMLSequenceStream');
$path = "xmlseq://zip://ipg140107.zip#ipg140107.xml";
while (XMLSequenceStream::notAtEndOfSequence($path)) {
$reader = new XMLReader();
$reader->open($path);
// just consume the whole document
while ($reader::next()) {
XMLReaderNode::dump($reader);
}
}
XMLSequenceStream::clean();
该流包装器是XMLReaderIterator library的一部分,与SimpleXMLElement或DOMDocument一样,虽然对于较大的文件,XMLReader更适合。
对于我在我的示例(http://storage.googleapis.com/patents/grant_full_text/2014/ipg140107.zip中的https://www.google.com/googlebooks/uspto-patents-grants-text.html)中所采用的文件,该序列中不同树的整体元素结构计数元素例如是:
\-us-patent-grant (473)
|-us-bibliographic-data-grant (473)
| |-publication-reference (473)
| | \-document-id (473)
| | |-country (473)
| | |-doc-number (473)
| | |-kind (473)
| | \-date (473)
| |-application-reference (473)
| | \-document-id (473)
| | |-country (473)
| | |-doc-number (473)
| | \-date (473)
| |-us-application-series-code (473)
| |-us-term-of-grant (470)
| | |-length-of-grant (450)
| | |-disclaimer (18)
| | | \-text (18)
| | \-us-term-extension (20)
| |-classification-locarno (450)
| | |-edition (450)
| | \-main-classification (450)
| |-classification-national (473)
| | |-country (473)
| | |-main-classification (473)
| | \-further-classification (143)
| |-invention-title (473)
| | \-i (12)
| |-us-references-cited (458)
| | \-us-citation (11000)
| | |-patcit (10265)
| | | \-document-id (10265)
| | | |-country (10265)
| | | |-doc-number (10265)
| | | |-kind (9884)
| | | |-name (9811)
| | | \-date (10264)
| | |-category (10999)
| | |-classification-national (6309)
| | | |-country (6309)
| | | \-main-classification (6309)
| | |-nplcit (735)
| | | \-othercit (735)
| | | |-sub (281)
| | | |-i (7)
| | | \-sup (1)
| | \-classification-cpc-text (1)
| |-number-of-claims (472)
| |-us-exemplary-claim (472)
| |-us-field-of-classification-search (472)
| | \-classification-national (8991)
| | |-country (8991)
| | |-main-classification (8991)
| | \-additional-info (1205)
| |-figures (472)
| | |-number-of-drawing-sheets (472)
| | \-number-of-figures (472)
| |-us-parties (472)
| | |-us-applicants (472)
| | | \-us-applicant (765)
| | | |-addressbook (765)
| | | | |-last-name (573)
| | | | |-first-name (573)
| | | | |-address (765)
| | | | | |-city (765)
| | | | | |-country (765)
| | | | | \-state (423)
| | | | \-orgname (192)
| | | \-residence (765)
| | | \-country (765)
| | |-inventors (472)
| | | \-inventor (969)
| | | \-addressbook (969)
| | | |-last-name (969)
| | | |-first-name (969)
| | | \-address (969)
| | | |-city (969)
| | | |-country (969)
| | | \-state (519)
| | \-agents (429)
| | \-agent (500)
| | \-addressbook (500)
| | |-orgname (361)
| | |-address (500)
| | | \-country (500)
| | |-last-name (139)
| | \-first-name (139)
| |-assignees (385)
| | \-assignee (391)
| | |-addressbook (390)
| | | |-orgname (386)
| | | |-role (390)
| | | |-address (390)
| | | | |-city (355)
| | | | |-country (390)
| | | | \-state (192)
| | | |-last-name (4)
| | | \-first-name (4)
| | |-orgname (1)
| | \-role (1)
| |-examiners (472)
| | |-primary-examiner (472)
| | | |-last-name (472)
| | | |-first-name (472)
| | | \-department (472)
| | \-assistant-examiner (65)
| | |-last-name (65)
| | \-first-name (65)
| |-us-related-documents (65)
| | |-continuation-in-part (16)
| | | \-relation (16)
| | | |-parent-doc (16)
| | | | |-document-id (16)
| | | | | |-country (16)
| | | | | |-doc-number (16)
| | | | | \-date (16)
| | | | |-parent-status (11)
| | | | \-parent-grant-document (5)
| | | | \-document-id (5)
| | | | |-country (5)
| | | | |-doc-number (5)
| | | | \-date (2)
| | | \-child-doc (16)
| | | \-document-id (16)
| | | |-country (16)
| | | \-doc-number (16)
| | |-continuation (21)
| | | \-relation (21)
| | | |-parent-doc (21)
| | | | |-document-id (21)
| | | | | |-country (21)
| | | | | |-doc-number (21)
| | | | | \-date (21)
| | | | |-parent-status (16)
| | | | \-parent-grant-document (5)
| | | | \-document-id (5)
| | | | |-country (5)
| | | | |-doc-number (5)
| | | | \-date (2)
| | | \-child-doc (21)
| | | \-document-id (21)
| | | |-country (21)
| | | \-doc-number (21)
| | |-division (32)
| | | \-relation (32)
| | | |-parent-doc (32)
| | | | |-document-id (32)
| | | | | |-country (32)
| | | | | |-doc-number (32)
| | | | | \-date (32)
| | | | |-parent-grant-document (24)
| | | | | \-document-id (24)
| | | | | |-country (24)
| | | | | |-doc-number (24)
| | | | | \-date (1)
| | | | \-parent-status (8)
| | | \-child-doc (32)
| | | \-document-id (32)
| | | |-country (32)
| | | \-doc-number (32)
| | \-related-publication (9)
| | \-document-id (9)
| | |-country (9)
| | |-doc-number (9)
| | |-kind (9)
| | \-date (9)
| |-priority-claims (140)
| | \-priority-claim (182)
| | |-country (182)
| | |-doc-number (182)
| | \-date (182)
| |-us-sir-flag (1)
| |-classifications-ipcr (23)
| | \-classification-ipcr (24)
| | |-ipc-version-indicator (24)
| | | \-date (24)
| | |-classification-level (24)
| | |-section (24)
| | |-class (24)
| | |-subclass (24)
| | |-main-group (24)
| | |-subgroup (24)
| | |-symbol-position (24)
| | |-classification-value (24)
| | |-action-date (24)
| | | \-date (24)
| | |-generating-office (24)
| | | \-country (24)
| | |-classification-status (24)
| | \-classification-data-source (24)
| |-us-botanic (21)
| | |-latin-name (21)
| | \-variety (21)
| \-classifications-cpc (1)
| \-main-cpc (1)
| \-classification-cpc (1)
| |-cpc-version-indicator (1)
| | \-date (1)
| |-section (1)
| |-class (1)
| |-subclass (1)
| |-main-group (1)
| |-subgroup (1)
| |-symbol-position (1)
| |-classification-value (1)
| |-action-date (1)
| | \-date (1)
| |-generating-office (1)
| | \-country (1)
| |-classification-status (1)
| |-classification-data-source (1)
| \-scheme-origination-code (1)
|-drawings (472)
| \-figure (3033)
| \-img (3033)
|-description (472)
| |-description-of-drawings (472)
| | |-p (3955)
| | | |-figref (4478)
| | | |-b (86)
| | | \-i (6)
| | \-heading (22)
| |-heading (162)
| \-p (340)
| |-figref (15)
| |-b (250)
| |-i (146)
| |-ul (96)
| | \-li (444)
| | |-ul (215)
| | | \-li (273)
| | | |-ul (199)
| | | | \-li (1192)
| | | | |-i (1219)
| | | | |-b (1)
| | | | |-sup (10)
| | | | \-sub (2)
| | | \-i (11)
| | |-sup (2)
| | \-i (26)
| |-tables (15)
| | \-table (15)
| | \-tgroup (49)
| | |-colspec (175)
| | |-thead (15)
| | | \-row (27)
| | | \-entry (51)
| | \-tbody (49)
| | \-row (291)
| | \-entry (997)
| | \-sup (28)
| \-sup (2)
|-us-claim-statement (472)
|-claims (472)
| \-claim (476)
| \-claim-text (476)
| |-figref (1)
| |-claim-text (5)
| |-claim-ref (4)
| \-i (15)
\-abstract (22)
\-p (22)
|-i (27)
\-ul (2)
\-li (2)
\-ul (2)
\-li (11)