如何使用php读取文件中的多个xml内容

时间:2011-04-19 15:00:31

标签: php xml parsing

我正在处理这种XML序列文件,你可以建议我解析这个:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]>
<name>ccccc</name>
<document-id>
<country>US</country>
<doc-number>D0629997</doc-number>
<kind>S1</kind>
<date>20110104</date>
</document-id>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]>
<name>dddd</name>
<document-id>
<country>US</country>
<doc-number>D0629998</doc-number>
<kind>S2</kind>
<date>20110104</date>
</document-id>

3 个答案:

答案 0 :(得分:1)

这不是一个有效的XML文件。它看起来像一个两个文件,但即便如此它也是无效的。假设这些是两个单独的文件,您可以先尝试“整理”它们。假设$ xml是一个包含xml内容的字符串:

$xml = tidy_repair_string($xml, array(
    'output-xml' => true,
    'input-xml' => true
)); 

然后你就可以使用SimpleXml了:

$xml = new SimpleXmlElement($xml);

答案 1 :(得分:0)

我知道这个XML文件来自哪里,我发现很奇怪Google会提供一些无效的XML(除非他们只是托管他们从其他地方获得的这个文件)。这个解析它的建议对我有用:How to parse an xml file with multiple xml declaration using PHP? (A concatenation of several XML files)

答案 2 :(得分:0)

该文件包含一系列彼此连接的XML文档。您需要注册一个透明地为您分割文件的PHP streamwrapper,然后您可以单独处理每个文档,甚至以流方式处理。例如:

stream_wrapper_register('xmlseq', 'XMLSequenceStream');

$path = "xmlseq://zip://ipg140107.zip#ipg140107.xml";

while (XMLSequenceStream::notAtEndOfSequence($path)) {
    $reader = new XMLReader();
    $reader->open($path);
    // just consume the whole document
    while ($reader::next()) {
        XMLReaderNode::dump($reader);
    }
}

XMLSequenceStream::clean();    

该流包装器是XMLReaderIterator library的一部分,与SimpleXMLElement或DOMDocument一样,虽然对于较大的文件,XMLReader更适合。

对于我在我的示例(http://storage.googleapis.com/patents/grant_full_text/2014/ipg140107.zip中的https://www.google.com/googlebooks/uspto-patents-grants-text.html)中所采用的文件,该序列中不同树的整体元素结构计数元素例如是:

\-us-patent-grant (473)
  |-us-bibliographic-data-grant (473)
  | |-publication-reference (473)
  | | \-document-id (473)
  | |   |-country (473)
  | |   |-doc-number (473)
  | |   |-kind (473)
  | |   \-date (473)
  | |-application-reference (473)
  | | \-document-id (473)
  | |   |-country (473)
  | |   |-doc-number (473)
  | |   \-date (473)
  | |-us-application-series-code (473)
  | |-us-term-of-grant (470)
  | | |-length-of-grant (450)
  | | |-disclaimer (18)
  | | | \-text (18)
  | | \-us-term-extension (20)
  | |-classification-locarno (450)
  | | |-edition (450)
  | | \-main-classification (450)
  | |-classification-national (473)
  | | |-country (473)
  | | |-main-classification (473)
  | | \-further-classification (143)
  | |-invention-title (473)
  | | \-i (12)
  | |-us-references-cited (458)
  | | \-us-citation (11000)
  | |   |-patcit (10265)
  | |   | \-document-id (10265)
  | |   |   |-country (10265)
  | |   |   |-doc-number (10265)
  | |   |   |-kind (9884)
  | |   |   |-name (9811)
  | |   |   \-date (10264)
  | |   |-category (10999)
  | |   |-classification-national (6309)
  | |   | |-country (6309)
  | |   | \-main-classification (6309)
  | |   |-nplcit (735)
  | |   | \-othercit (735)
  | |   |   |-sub (281)
  | |   |   |-i (7)
  | |   |   \-sup (1)
  | |   \-classification-cpc-text (1)
  | |-number-of-claims (472)
  | |-us-exemplary-claim (472)
  | |-us-field-of-classification-search (472)
  | | \-classification-national (8991)
  | |   |-country (8991)
  | |   |-main-classification (8991)
  | |   \-additional-info (1205)
  | |-figures (472)
  | | |-number-of-drawing-sheets (472)
  | | \-number-of-figures (472)
  | |-us-parties (472)
  | | |-us-applicants (472)
  | | | \-us-applicant (765)
  | | |   |-addressbook (765)
  | | |   | |-last-name (573)
  | | |   | |-first-name (573)
  | | |   | |-address (765)
  | | |   | | |-city (765)
  | | |   | | |-country (765)
  | | |   | | \-state (423)
  | | |   | \-orgname (192)
  | | |   \-residence (765)
  | | |     \-country (765)
  | | |-inventors (472)
  | | | \-inventor (969)
  | | |   \-addressbook (969)
  | | |     |-last-name (969)
  | | |     |-first-name (969)
  | | |     \-address (969)
  | | |       |-city (969)
  | | |       |-country (969)
  | | |       \-state (519)
  | | \-agents (429)
  | |   \-agent (500)
  | |     \-addressbook (500)
  | |       |-orgname (361)
  | |       |-address (500)
  | |       | \-country (500)
  | |       |-last-name (139)
  | |       \-first-name (139)
  | |-assignees (385)
  | | \-assignee (391)
  | |   |-addressbook (390)
  | |   | |-orgname (386)
  | |   | |-role (390)
  | |   | |-address (390)
  | |   | | |-city (355)
  | |   | | |-country (390)
  | |   | | \-state (192)
  | |   | |-last-name (4)
  | |   | \-first-name (4)
  | |   |-orgname (1)
  | |   \-role (1)
  | |-examiners (472)
  | | |-primary-examiner (472)
  | | | |-last-name (472)
  | | | |-first-name (472)
  | | | \-department (472)
  | | \-assistant-examiner (65)
  | |   |-last-name (65)
  | |   \-first-name (65)
  | |-us-related-documents (65)
  | | |-continuation-in-part (16)
  | | | \-relation (16)
  | | |   |-parent-doc (16)
  | | |   | |-document-id (16)
  | | |   | | |-country (16)
  | | |   | | |-doc-number (16)
  | | |   | | \-date (16)
  | | |   | |-parent-status (11)
  | | |   | \-parent-grant-document (5)
  | | |   |   \-document-id (5)
  | | |   |     |-country (5)
  | | |   |     |-doc-number (5)
  | | |   |     \-date (2)
  | | |   \-child-doc (16)
  | | |     \-document-id (16)
  | | |       |-country (16)
  | | |       \-doc-number (16)
  | | |-continuation (21)
  | | | \-relation (21)
  | | |   |-parent-doc (21)
  | | |   | |-document-id (21)
  | | |   | | |-country (21)
  | | |   | | |-doc-number (21)
  | | |   | | \-date (21)
  | | |   | |-parent-status (16)
  | | |   | \-parent-grant-document (5)
  | | |   |   \-document-id (5)
  | | |   |     |-country (5)
  | | |   |     |-doc-number (5)
  | | |   |     \-date (2)
  | | |   \-child-doc (21)
  | | |     \-document-id (21)
  | | |       |-country (21)
  | | |       \-doc-number (21)
  | | |-division (32)
  | | | \-relation (32)
  | | |   |-parent-doc (32)
  | | |   | |-document-id (32)
  | | |   | | |-country (32)
  | | |   | | |-doc-number (32)
  | | |   | | \-date (32)
  | | |   | |-parent-grant-document (24)
  | | |   | | \-document-id (24)
  | | |   | |   |-country (24)
  | | |   | |   |-doc-number (24)
  | | |   | |   \-date (1)
  | | |   | \-parent-status (8)
  | | |   \-child-doc (32)
  | | |     \-document-id (32)
  | | |       |-country (32)
  | | |       \-doc-number (32)
  | | \-related-publication (9)
  | |   \-document-id (9)
  | |     |-country (9)
  | |     |-doc-number (9)
  | |     |-kind (9)
  | |     \-date (9)
  | |-priority-claims (140)
  | | \-priority-claim (182)
  | |   |-country (182)
  | |   |-doc-number (182)
  | |   \-date (182)
  | |-us-sir-flag (1)
  | |-classifications-ipcr (23)
  | | \-classification-ipcr (24)
  | |   |-ipc-version-indicator (24)
  | |   | \-date (24)
  | |   |-classification-level (24)
  | |   |-section (24)
  | |   |-class (24)
  | |   |-subclass (24)
  | |   |-main-group (24)
  | |   |-subgroup (24)
  | |   |-symbol-position (24)
  | |   |-classification-value (24)
  | |   |-action-date (24)
  | |   | \-date (24)
  | |   |-generating-office (24)
  | |   | \-country (24)
  | |   |-classification-status (24)
  | |   \-classification-data-source (24)
  | |-us-botanic (21)
  | | |-latin-name (21)
  | | \-variety (21)
  | \-classifications-cpc (1)
  |   \-main-cpc (1)
  |     \-classification-cpc (1)
  |       |-cpc-version-indicator (1)
  |       | \-date (1)
  |       |-section (1)
  |       |-class (1)
  |       |-subclass (1)
  |       |-main-group (1)
  |       |-subgroup (1)
  |       |-symbol-position (1)
  |       |-classification-value (1)
  |       |-action-date (1)
  |       | \-date (1)
  |       |-generating-office (1)
  |       | \-country (1)
  |       |-classification-status (1)
  |       |-classification-data-source (1)
  |       \-scheme-origination-code (1)
  |-drawings (472)
  | \-figure (3033)
  |   \-img (3033)
  |-description (472)
  | |-description-of-drawings (472)
  | | |-p (3955)
  | | | |-figref (4478)
  | | | |-b (86)
  | | | \-i (6)
  | | \-heading (22)
  | |-heading (162)
  | \-p (340)
  |   |-figref (15)
  |   |-b (250)
  |   |-i (146)
  |   |-ul (96)
  |   | \-li (444)
  |   |   |-ul (215)
  |   |   | \-li (273)
  |   |   |   |-ul (199)
  |   |   |   | \-li (1192)
  |   |   |   |   |-i (1219)
  |   |   |   |   |-b (1)
  |   |   |   |   |-sup (10)
  |   |   |   |   \-sub (2)
  |   |   |   \-i (11)
  |   |   |-sup (2)
  |   |   \-i (26)
  |   |-tables (15)
  |   | \-table (15)
  |   |   \-tgroup (49)
  |   |     |-colspec (175)
  |   |     |-thead (15)
  |   |     | \-row (27)
  |   |     |   \-entry (51)
  |   |     \-tbody (49)
  |   |       \-row (291)
  |   |         \-entry (997)
  |   |           \-sup (28)
  |   \-sup (2)
  |-us-claim-statement (472)
  |-claims (472)
  | \-claim (476)
  |   \-claim-text (476)
  |     |-figref (1)
  |     |-claim-text (5)
  |     |-claim-ref (4)
  |     \-i (15)
  \-abstract (22)
    \-p (22)
      |-i (27)
      \-ul (2)
        \-li (2)
          \-ul (2)
            \-li (11)