如何使用XMLReader来解析XML元素/子元素的多个同名命名属性

时间:2010-11-09 01:01:17

标签: php xml xmlreader

我正在使用XMLReader和PHP来处理中等大小的XML文件(6mb),并且基本上将属性数据分解并将其插入到我自己的数据库中。问题是,每个元素都有可变数量的子元素,这些子元素具有相同的命名属性。

这是一个例子(这是关于政府礼貌govtrack.us的公开数据):

<?xml version="1.0" ?>
<people>
    <person id='400001' lastname='Abercrombie' firstname='Neil' birthday='1938-06-26' gender='M' pvsid='26827' osid='N00007665' bioguideid='A000014' metavidid='Neil_Abercrombie' youtubeid='hawaiirep1' name='Rep. Neil Abercrombie [D, HI-1]' title='Rep.' state='HI' district='1' >
        <role type='rep' startdate='1985-01-03' enddate='1986-10-18' party='Democrat' state='HI' district='1' />
        <role type='rep' startdate='1991-01-03' enddate='1992-10-09' party='Democrat' state='HI' district='1' />
        <role type='rep' startdate='1993-01-05' enddate='1994-12-01' party='Democrat' state='HI' district='1' />
        <role type='rep' startdate='1995-01-04' enddate='1996-10-04' party='Democrat' state='HI' district='1' />
        <role type='rep' startdate='1997-01-07' enddate='1998-12-19' party='Democrat' state='HI' district='1' />
        <role type='rep' startdate='1999-01-06' enddate='2000-12-15' party='Democrat' state='HI' district='1' />
        <role type='rep' startdate='2001-01-03' enddate='2002-11-22' party='Democrat' state='HI' district='1' />
        <role type='rep' startdate='2003-01-07' enddate='2004-12-09' party='Democrat' state='HI' district='1' url='http://www.house.gov/abercrombie' />
        <role type='rep' startdate='2005-01-04' enddate='2006-12-08' party='Democrat' state='HI' district='1' url='http://www.house.gov/abercrombie' />
        <role type='rep' startdate='2007-01-04' enddate='2009-01-03' party='Democrat' state='HI' district='1' url='http://www.house.gov/abercrombie' />
        <role type='rep' startdate='2009-01-06' enddate='2010-03-01' party='Democrat' state='HI' district='1' url='http://www.house.gov/abercrombie' />
</person>

我不需要对属性执行任何奇特的逻辑。在我的脚本开头,我检查是否已经处理了这个特定的记录(基于'id'属性),然后我抓住了几乎每个属性并将其解析到我的数据库中。但有两个问题:

1)当我使用它时:

$p->getAttribute('id')

获取'id',它给了我两次,与元素中的子元素一样多的换行符分开(我认为this page上的评论说明了这一点,但我不是确定该怎么办)。

2)如何按顺序访问每个子元素的属性?这样:

$p->getAttribute('startdate')

给我每个'startdate'值,由多个换行符分隔。我只需要获取元素的id,然后遍历每个'role'子元素。

有什么想法吗?

对于启发,这是我到目前为止的超简单控制器:

$f = base_url().'data/people.xml';
$p = new XMLReader;
$p->open($f);
while($p->read())
{
    if($this->_notImported('govtrack',$p->getAttribute('id')))
    {
            // here I just grab the attributes, put them into arrays to insert, like so:
            $insert = array('indiv_name' => $full_name,
                                    'indiv_first' => ($p->getAttribute(‘firstname’)),
                                    'indiv_last' => ($p->getAttribute(‘lastname’)),
                                    'indiv_middle' => ($p->getAttribute(‘middlename’)),
                                    'indiv_other' => ($p->getAttribute(‘namemod’)),
                                    'indiv_full_name' => $full_name,
                                    'indiv_title' => ($p->getAttribute(‘title’)),
                                    'indiv_dob' => ($p->getAttribute(‘birthday’)),
                                    'indiv_gender' => ($p->getAttribute(‘gender’)),
                                    'indiv_religion' => ($p->getAttribute(‘religion’)),
                                    'indiv_url' => ($url)
                                    );

对于元素,这并不困难,但我不知道如何循环遍历每个“角色”子元素并分别获取属性。

1 个答案:

答案 0 :(得分:3)

您的第一个问题是您没有检查相应的 nodeType ,这实际上与您链接的注释有关:它与开始标记(ELEMENT)和结束标记相匹配( END_ELEMENT)。

您的第二个问题也与缺少的 nodeType 检查有关。解决之后,您只需检查节点的名称,以确定它是<role>还是<person>

由于我假设您还在阅读大型XML文件,因此您可能想知道何时传递给下一个人物标签...(通过END_ELEMENT nodeType )请参阅我的例子如下:

while($p->read()) {
    // check for nodeType here (opening tag only)
    if ($p->nodeType == XMLReader::ELEMENT) {
        if ($p->name == 'person') {
            if ($this->_notImported('govtrack',$p->getAttribute('id'))) {
                // $insert['indiv_*'] stuff here
            } else {
                $insert = null; // skip record because it's already imported
            }
        } else if ($p->name == 'role') {
            // role stuff here
            $startdate = $p->getAttribute('startdate');
        }

    // check for closing </person> tag here
    } else if ($p->nodeType == XMLReader::END_ELEMENT && $p->name == 'person') {
        if (isset($insert)) {
            // db insert here
        }
    }
}

顺便说一下,如果您希望这样做,那么您的引号必须替换为正确的引号'