使用regex过滤xml元素以返回匹配元素

时间:2016-04-17 17:41:58

标签: php regex xml

我试图使用正则表达式过滤器来返回特定的XML元素。

xml看起来如下所示:

<outfits>
    <outfit name="babe0" color="0xF2C291" mood="1" species="babe">
        <head url="http://assets.zwinky.com/assets/babe/heads/01/head1" c="0xF4C4A4" c2="0xffffff" z="33000"/>
        <face url="http://assets.zwinky.com/assets/babe/faces/01/grl1" c="0x8442" c2="0xFE89B9" displayName="girl1" z="34000" id="20014794"/>
        <midsection url="http://assets.zwinky.com/assets/babe/midsections/01/ms1" c="0xBCE4FE" z="9000"/>
        <leg url="http://assets.zwinky.com/assets/babe/legs/01/legs1" z="10000"/>
        <hair url="http://assets.zwinky.com/assets/babe/hair/01/hr3" c="0xA55200" displayName="straight n' long" z="37000" id="20014869"/>
    </outfit>
    <outfit thumbnailUrl="users/908/721/swagg_ma_blue/thumbnail-12631.jpg" default="1" name="babe1" color="0xF2C291" mood="1" species="babe">
        <head url="http://assets.zwinky.com/assets/babe/heads/01/head1" c="0xF4C4A4" c2="0xffffff" z="33000"/>
        <face url="http://assets.zwinky.com/assets/babe/faces/01/grl1" c="0x0000CC" c2="0xFE89B9" displayName="girl1" z="34000" id="20014794"/>
        <midsection url="http://assets.zwinky.com/assets/babe/midsections/01/ms1" c="0xBCE4FE" z="9000"/>
        <leg url="http://assets.zwinky.com/assets/babe/legs/01/legs1" z="10000"/>
        <hair url="http://assets.zwinky.com/assets/babe/hair/01/hr3" c="0xCA9460" displayName="straight n' long" z="37000" id="20014869"/>
        <shirt url="http://assets.zwinky.com/assets/babe/tops/01/top9" c="0xFFCC66" displayName="tube top2" z="21000" id="20014829"/>
        <bottom url="http://assets.zwinky.com/assets/babe/bottoms/01/bm27" c="0x333333" displayName="pants w/chains" z="20000" id="20014937"/>
        <shoes url="http://assets.zwinky.com/assets/babe/shoes/01/sh26" c="0x009933" displayName="elf boots" z="19000" id="20014976"/>
        <hat url="http://assets.zwinky.com/assets/babe/hats/01/hat1" c="0xCC9966" c2="0x999999" displayName="pageboy cap" z="40000" id="20015058"/>
        <bag url="http://assets.zwinky.com/assets/babe/bags/01/bag15" c="0xFF6600" c2="0x333333" displayName="trick or treat2" z="43000" id="20015070"/>
        <belt url="http://assets.zwinky.com/assets/babe/belts/01/blt16" c="0x333333" displayName="chain belt3" z="26000" id="20015085"/>
    </outfit>
</outfits>

正如您所看到的,有两个名为outfit的节点,其中一个包含以下参数:default="1"现在我想抓住整个元素,如:

<outfit thumbnailUrl="users/908/721/swagg_ma_blue/thumbnail-12631.jpg" default="1" name="babe1" color="0xF2C291" mood="1" species="babe">
        <head url="http://assets.zwinky.com/assets/babe/heads/01/head1" c="0xF4C4A4" c2="0xffffff" z="33000"/>
        <face url="http://assets.zwinky.com/assets/babe/faces/01/grl1" c="0x0000CC" c2="0xFE89B9" displayName="girl1" z="34000" id="20014794"/>
        <midsection url="http://assets.zwinky.com/assets/babe/midsections/01/ms1" c="0xBCE4FE" z="9000"/>
        <leg url="http://assets.zwinky.com/assets/babe/legs/01/legs1" z="10000"/>
        <hair url="http://assets.zwinky.com/assets/babe/hair/01/hr3" c="0xCA9460" displayName="straight n' long" z="37000" id="20014869"/>
        <shirt url="http://assets.zwinky.com/assets/babe/tops/01/top9" c="0xFFCC66" displayName="tube top2" z="21000" id="20014829"/>
        <bottom url="http://assets.zwinky.com/assets/babe/bottoms/01/bm27" c="0x333333" displayName="pants w/chains" z="20000" id="20014937"/>
        <shoes url="http://assets.zwinky.com/assets/babe/shoes/01/sh26" c="0x009933" displayName="elf boots" z="19000" id="20014976"/>
        <hat url="http://assets.zwinky.com/assets/babe/hats/01/hat1" c="0xCC9966" c2="0x999999" displayName="pageboy cap" z="40000" id="20015058"/>
        <bag url="http://assets.zwinky.com/assets/babe/bags/01/bag15" c="0xFF6600" c2="0x333333" displayName="trick or treat2" z="43000" id="20015070"/>
        <belt url="http://assets.zwinky.com/assets/babe/belts/01/blt16" c="0x333333" displayName="chain belt3" z="26000" id="20015085"/>
    </outfit>

并将其归还。可悲的是我regex无法工作。我制作的regex如下所示:

/\<outfit.+?default="1".+?\>.+?\<\/outfit\>/i

PHP的一部分function

if (preg_match('/\<outfit.+?default="1".+?\>.+?\<\/outfit\>/i', $user_outfit, $match)) {
    return $match['0'];
}

有没有人知道我的正则表达式有什么问题?

2 个答案:

答案 0 :(得分:1)

最好采取解析器之路:

<?php
$xml = simplexml_load_string($html);
$elements = $xml->xpath("//outfit[@default=1]");
// to get the bag url
echo $elements[0]->bag["url"];
?>

这样,您就可以更好地分析XML

答案 1 :(得分:0)

.不包含新行,除非您使用s修饰符。所以正则表达式应该是这样的:

/\<outfit.+?default="1".+?\>.+?\<\/outfit\>/is