有几天我一直在搜索,阅读并试图解析我的XML文件,但到目前为止还没有运气。这是我的一个XML文件的示例:
<?xml version="1.0" encoding="windows-1252"?>
<?xml-stylesheet type="text/xsl" href="/rss/styles/shared_xsl_stylesheet_v2.xml"?>
<rss version="2.0">
<channel>
<title>All XBRL Data Submitted to the SEC for 2014-10</title>
<link>http://www.sec.gov/spotlight/xbrl/filings-and-feeds.shtml</link>
<atom:link href="http://www.sec.gov/Archives/edgar/monthly/xbrlrss-2014-10.xml" rel="self" type="application/rss+xml" xmlns:atom="http://www.w3.org/2005/Atom"/>
<description>This is a list all of the filings containing XBRL for 2014-10</description>
<language>en-us</language>
<pubDate>Mon, 27 Oct 2014 00:00:00 EDT</pubDate>
<lastBuildDate>Mon, 27 Oct 2014 00:00:00 EDT</lastBuildDate>
<item>
<title>Bling Marketing, Inc. (0001593549) (Filer)</title>
<link>http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/0001014897-14-000441-index.htm</link>
<guid>http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/0001014897-14-000441-xbrl.zip</guid>
<enclosure url="http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/0001014897-14-000441-xbrl.zip" length="30761" type="application/zip" />
<description>10-Q</description>
<pubDate>Mon, 27 Oct 2014 17:25:14 EDT</pubDate>
<edgar:xbrlFiling xmlns:edgar="http://www.sec.gov/Archives/edgar">
<edgar:companyName>Bling Marketing, Inc.</edgar:companyName>
<edgar:formType>10-Q</edgar:formType>
<edgar:filingDate>10/27/2014</edgar:filingDate>
<edgar:cikNumber>0001593549</edgar:cikNumber>
<edgar:accessionNumber>0001014897-14-000441</edgar:accessionNumber>
<edgar:fileNumber>333-192997</edgar:fileNumber>
<edgar:acceptanceDatetime>20141027172514</edgar:acceptanceDatetime>
<edgar:period>20140930</edgar:period>
<edgar:assistantDirector>2</edgar:assistantDirector>
<edgar:assignedSic>5094</edgar:assignedSic>
<edgar:fiscalYearEnd>1231</edgar:fiscalYearEnd>
<edgar:xbrlFiles>
<edgar:xbrlFile edgar:sequence="1" edgar:file="bling10q3q14v2.htm" edgar:type="10-Q" edgar:size="174242" edgar:description="FORM 10-Q" edgar:url="http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/bling10q3q14v2.htm" />
<edgar:xbrlFile edgar:sequence="2" edgar:file="bling10q3q14ex31.htm" edgar:type="EX-31" edgar:size="5481" edgar:description="EXHIBIT 31" edgar:url="http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/bling10q3q14ex31.htm" />
<edgar:xbrlFile edgar:sequence="3" edgar:file="bling10q3q14ex32.htm" edgar:type="EX-32" edgar:size="1827" edgar:description="EXHIBIT 32" edgar:url="http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/bling10q3q14ex32.htm" />
<edgar:xbrlFile edgar:sequence="4" edgar:file="blmi-20140930.xml" edgar:type="EX-101.INS" edgar:size="149179" edgar:description="XBRL INSTANCE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/blmi-20140930.xml" />
<edgar:xbrlFile edgar:sequence="5" edgar:file="blmi-20140930.xsd" edgar:type="EX-101.SCH" edgar:size="28373" edgar:description="XBRL TAXONOMY EXTENSION SCHEMA DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/blmi-20140930.xsd" />
<edgar:xbrlFile edgar:sequence="6" edgar:file="blmi-20140930_cal.xml" edgar:type="EX-101.CAL" edgar:size="7021" edgar:description="XBRL TAXONOMY EXTENSION CALCULATION LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/blmi-20140930_cal.xml" />
<edgar:xbrlFile edgar:sequence="7" edgar:file="blmi-20140930_def.xml" edgar:type="EX-101.DEF" edgar:size="17205" edgar:description="XBRL TAXONOMY EXTENSION DEFINITION LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/blmi-20140930_def.xml" />
<edgar:xbrlFile edgar:sequence="8" edgar:file="blmi-20140930_lab.xml" edgar:type="EX-101.LAB" edgar:size="74477" edgar:description="XBRL TAXONOMY EXTENSION LABEL LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/blmi-20140930_lab.xml" />
<edgar:xbrlFile edgar:sequence="9" edgar:file="blmi-20140930_pre.xml" edgar:type="EX-101.PRE" edgar:size="67806" edgar:description="XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/blmi-20140930_pre.xml" />
</edgar:xbrlFiles>
</edgar:xbrlFiling>
</item>
<item>
<title>Primco Management Inc. (0001516522) (Filer)</title>
<link>http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/0001014897-14-000414-index.htm</link>
<guid>http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/0001014897-14-000414-xbrl.zip</guid>
<enclosure url="http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/0001014897-14-000414-xbrl.zip" length="80939" type="application/zip" />
<description>10-K/A</description>
<pubDate>Tue, 30 Sep 2014 17:41:38 EDT</pubDate>
<edgar:xbrlFiling xmlns:edgar="http://www.sec.gov/Archives/edgar">
<edgar:companyName>Primco Management Inc.</edgar:companyName>
<edgar:formType>10-K/A</edgar:formType>
<edgar:filingDate>10/01/2014</edgar:filingDate>
<edgar:cikNumber>0001516522</edgar:cikNumber>
<edgar:accessionNumber>0001014897-14-000414</edgar:accessionNumber>
<edgar:fileNumber>000-54930</edgar:fileNumber>
<edgar:acceptanceDatetime>20140930174138</edgar:acceptanceDatetime>
<edgar:period>20131231</edgar:period>
<edgar:assistantDirector>8</edgar:assistantDirector>
<edgar:assignedSic>6531</edgar:assignedSic>
<edgar:fiscalYearEnd>1231</edgar:fiscalYearEnd>
<edgar:xbrlFiles>
<edgar:xbrlFile edgar:sequence="1" edgar:file="primco10k13am2v2.htm" edgar:type="10-K/A" edgar:size="482147" edgar:description="FORM 10-K/A" edgar:url="http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/primco10k13am2v2.htm" />
<edgar:xbrlFile edgar:sequence="2" edgar:file="primco10k13ex31.htm" edgar:type="EX-31" edgar:size="10412" edgar:description="EXHIBIT 31" edgar:url="http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/primco10k13ex31.htm" />
<edgar:xbrlFile edgar:sequence="3" edgar:file="primco10k13ex32.htm" edgar:type="EX-32" edgar:size="3121" edgar:description="EXHIBIT 32" edgar:url="http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/primco10k13ex32.htm" />
<edgar:xbrlFile edgar:sequence="4" edgar:file="pmcm-20131231.xml" edgar:type="EX-101.INS" edgar:size="891933" edgar:description="XBRL INSTANCE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/pmcm-20131231.xml" />
<edgar:xbrlFile edgar:sequence="5" edgar:file="pmcm-20131231.xsd" edgar:type="EX-101.SCH" edgar:size="54127" edgar:description="XBRL TAXONOMY EXTENSION SCHEMA DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/pmcm-20131231.xsd" />
<edgar:xbrlFile edgar:sequence="6" edgar:file="pmcm-20131231_cal.xml" edgar:type="EX-101.CAL" edgar:size="12529" edgar:description="XBRL TAXONOMY EXTENSION CALCULATION LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/pmcm-20131231_cal.xml" />
<edgar:xbrlFile edgar:sequence="7" edgar:file="pmcm-20131231_def.xml" edgar:type="EX-101.DEF" edgar:size="77249" edgar:description="XBRL TAXONOMY EXTENSION DEFINITION LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/pmcm-20131231_def.xml" />
<edgar:xbrlFile edgar:sequence="8" edgar:file="pmcm-20131231_lab.xml" edgar:type="EX-101.LAB" edgar:size="146832" edgar:description="XBRL TAXONOMY EXTENSION LABEL LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/pmcm-20131231_lab.xml" />
<edgar:xbrlFile edgar:sequence="9" edgar:file="pmcm-20131231_pre.xml" edgar:type="EX-101.PRE" edgar:size="131110" edgar:description="XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/pmcm-20131231_pre.xml" />
</edgar:xbrlFiles>
</edgar:xbrlFiling>
</item>
</channel>
</rss>
XML文件大约30MB,使用simpleXML解析应该没问题,但问题在于simpleXML无法处理像<edgar:formType>
这样的标记。猜猜他们不够“简单”; - )
因此,我尝试使用XMLReader进行解析, 能够解析这些标记。这适用于例如:
// Initialize XMLReader and DOMdocument
$reader = new XMLReader();
$reader->open("file.xml");
$storage = array();
// Move to the first <item> node
while ($reader->read() && $reader->name !== "item");
// Loop through the entire instance document
while ( $reader->read() ) {
// Ensure that nodeType is an Element and not an Attribute or Text
if($reader->nodeType == XMLReader::ELEMENT) {
// Extract and store filing info in $storage array
switch ($reader->localName) {
case "formType":
$reader->read();
$storage["formType"] = $reader->value;
break;
case "cikNumber":
$reader->read();
$storage["cik"] = $reader->value;
break;
default:
break;
}
echo "<pre>"; print_r($storage); echo "</pre>";
}
}
虽然print_r
会返回很多空的和重复的结果。
我想遍历每个<item>
并使用MySQLi存储以下数据点:
<guid>
<edgar:companyName>
<edgar:formType>
<edgar:filingDate>
<edgar:cikNumber>
<edgar:accessionNumber>
<edgar:period>
<edgar:fiscalYearEnd>
edgar:url
<edgar:xbrlFile>
节点的属性,如果属性edgar:description
=“XBRL INSTANCE DOCUMENT”
我觉得我很接近,因为我已经能够根据localName
(定位edgar:
之后的部分)提取值,但我不知道如何访问属性以及如何在我的数据库中按<item>
存储数据。
非常感谢这里的一些帮助,因为我在网上找到的所有示例都没有显示如何处理这些带前缀的XML标记。提前谢谢!
尼克
答案 0 :(得分:0)
由于您最初尝试使用SimpleXML执行此操作,因此这是一个使用simpleXML解析XML的解决方案,特别是它的xpath method,它为您提供了一种非常简单,方便的方法来选择XML文档中的节点。大多数
# you will probably be loading the XML from a file here rather than a string...
$sxe = simplexml_load_string( $xml );
# this xpath looks for "item" elements that are under the "channel" element
foreach ($sxe->xpath("channel/item") as $i) {
# for this example, I'll just store the data and print it after parsing each item
$data = array();
# cast the node as a string
$data['guid'] = (string)$i->guid;
# register the URI associated with the 'edgar' namespace
# tags can be referred to using "e:tagName" from now on
$i->registerXPathNamespace("e", "http://www.sec.gov/Archives/edgar");
foreach ( array("companyName", "formType", "filingDate", "cikNumber",
"accessionNumber", "period", "fiscalYearEnd") as $tag) {
# create the xpath dynamically from the tag name. All tags are under the "item"
# node ($i) under the parent edgar:xbrlFiling (i.e. e:xbrlFiling)
$data[ $tag ] = (string)$i->xpath("e:xbrlFiling/e:$tag")[0];
}
# this searches for e:xbrlFile nodes with description "XBRL INSTANCE DOCUMENT"
# the final /@e:url returns the e:url attribute, rather than the node itself
foreach ($i->xpath(
"e:xbrlFiling/e:xbrlFiles/e:xbrlFile[@e:description='XBRL INSTANCE DOCUMENT']/@e:url"
) as $url) {
# you may want to use an array here if there are several such URLs
$data['url'] = (string)$url;
}
print_r($data);
}
输出:
Array
(
[guid] => http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/0001014897-14-000441-xbrl.zip
[companyName] => Bling Marketing, Inc.
[formType] => 10-Q
[filingDate] => 10/27/2014
[cikNumber] => 0001593549
[accessionNumber] => 0001014897-14-000441
[period] => 20140930
[fiscalYearEnd] => 1231
[url] => http://www.sec.gov/Archives/edgar/data/1593549/000101489714000441/blmi-20140930.xml
)
Array
(
[guid] => http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/0001014897-14-000414-xbrl.zip
[companyName] => Primco Management Inc.
[formType] => 10-K/A
[filingDate] => 10/01/2014
[cikNumber] => 0001516522
[accessionNumber] => 0001014897-14-000414
[period] => 20131231
[fiscalYearEnd] => 1231
[url] => http://www.sec.gov/Archives/edgar/data/1516522/000101489714000414/pmcm-20131231.xml
)
答案 1 :(得分:0)
尝试item-&gt; children(&#39; edgar&#39;,true) - &gt; ...来解析它们。我认为这将允许您使用simplexml。 edgar:被称为命名空间,并且与xml文件使用相当多。我有一段时间没有同样的问题,这为我解决了这个问题,