使用preg_match或regex替换html中的很多单词

时间:2013-04-08 20:17:47

标签: php regex html-parsing

对于我来说,转换以下格式的html是否有任何好的解决方案

            <span xmlns:v="http://rdf.data-vocabulary.org/#">
                <span typeof="v:Breadcrumb">
                    <a href="http://link1.com/" rel="v:url" property="v:title">Home</a>
                </span> 
                / 
                <span typeof="v:Breadcrumb">
                    <a href="http://link2.com/" rel="v:url" property="v:title">Child 2</a>
                </span>
                / 
                <span typeof="v:Breadcrumb">
                    <a href="http://link3.com/" rel="v:url" property="v:title">Child 3</a>
                </span> 
                / 
                <span typeof="v:Breadcrumb">
                    <span class="breadcrumb_last" property="v:title">Child 4</span>
                </span>
            </span>

进入

            <span itemscope="" itemtype="http://data-vocabulary.org/Breadcrumb">
                <span typeof="v:Breadcrumb">
                    <a href="http://link1.com/" itemprop="url">
                        <span itemprop="title">Home</span>
                    </a>
                </span> 
                /
                <span typeof="v:Breadcrumb">
                    <a href="http://link2.com/" itemprop="url">
                        <span itemprop="title">Child 2</span>
                    </a>
                </span> 
                / 
                <span typeof="v:Breadcrumb">
                    <a href="http://link3.com/" itemprop="url">
                        <span itemprop="title">Child 3</span>
                    </a>
                </span> 
                / 
                <span>
                    <span class="breadcrumb_last">
                        <span itemprop="title">Child 4</span>
                    </span>
                </span>
            </span>
用PHP填写?我想将RDFa中的面包褶皱结构转换为Microdata。谢谢你的帮助

1 个答案:

答案 0 :(得分:1)

使用regexp的解决方案,这适用于您的示例代码,但是当属性顺序更改时,它会失败:

 $pattern = '#(?:rel\=\"v\:url\"\)? property\=\"v\:title\"\>([^\<]*)\<#ui';
 $replacement = ' itemprop="url"><span itemprop="title">$1</span><';
 $output = preg_replace($pattern,$replacement,$original);

如果可能,当您想要操作HTML / XML源时,请始终考虑HTML / XML解析,这是一个强大的工具:https://code.google.com/p/phpquery/。如果你使用jQuery js框架,这个工具对你来说很容易;)参见:

require_once 'phpquery/phpQuery.php';
$dom = phpQuery::newDocument($original);
foreach($dom->find('a[rel="v:url"]') as &$item){
    $txt = $this->text();
    $item->
       removeAttr('rel')->
       removeAttr('property')->
       attr('itemprop','url')->
       html("<span itemprop=\"title\">$txt</span>");        
}
$output = "$original";