PHP中的高效字符串替换

时间:2014-01-07 18:37:20

标签: php html regex preg-replace

有没有办法让这段代码更有效率?我不是在寻找有人为我编写代码,只是为了指出我正确的方向......

    $string = preg_replace('/<ref[^>]*>([\s\S]*?)<\/ref[^>]*>/', '', $string);
    $string = preg_replace('/{{(.*?)\}}/s', '', $string); 
    $string = preg_replace('/File:(.*?)\\n/s', '', $string);
    $string = preg_replace('/==(.*?)\=\\n/s', '', $string);        
    $string = str_replace('|', '/', $string);
    $string = str_replace('[[', '', $string);
    $string = str_replace(']]', '', $string);
    $string = strip_tags($string);
然而,问题是,更换必须按此顺序进行......

示例输入文字:

    ===API sharing and reuse via virtual machine===
{{Expand section|date=December 2013}}

Some languages like those running in a [[virtual machine]] (e.g. [[List of CLI languages|.NET CLI compliant languages]] in the [[Common Language Runtime]] (CLR), and [[List of JVM languages|JVM compliant languages]] in the [[Java Virtual Machine]]) can share an API.  In this case, a virtual machine enables [[language interoperability]], by abstracting a programming language using an intermediate [[bytecode]] and its [[language binding]]s.==Web APIs==
{{Main|Web API}}
When used in the context of [[web development]], an API is typically defined as a set of [[Hypertext Transfer Protocol]] (HTTP) request messages, along with a definition of the structure of response messages, which is usually in an Extensible Markup Language ([[XML]]) or JavaScript Object Notation ([[JSON]]) format. While "web API" historically has been virtually synonymous for [[web service]], the recent trend (so-called [[Web 2.0]]) has been moving away from Simple Object Access Protocol ([[SOAP]]) based web services and [[service-oriented architecture]] (SOA) towards more direct [[representational state transfer]] (REST) style [[web resource]]s and [[resource-oriented architecture]] (ROA).<ref>
{{cite web
 |first       = Djamal
 |last        = Benslimane
 |coauthors   = Schahram Dustdar, and Amit Sheth
 |title       = Services Mashups: The New Generation of Web Applications
 |url         = http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&pName=dso_level1&path=dsonline/2008/09&file=w5gei.xml&xsl=article.xsl
 |work        = IEEE Internet Computing, vol. 12, no. 5
 |publisher   = Institute of Electrical and Electronics Engineers
 |pages       = 13–15
 |year        = 2008
}}
</ref> Part of this trend is related to the [[Semantic Web]] movement toward [[Resource Description Framework]] (RDF), a concept to promote web-based [[ontology engineering]] technologies. Web APIs allow the combination of multiple APIs into new applications known as [[mashup (web application hybrid)|mashup]]s.<ref>
{{citation
 |first       = James
 |last        = Niccolai
 |title       = So What Is an Enterprise Mashup, Anyway?
 |url         = http://www.pcworld.com/businesscenter/article/145039/so_what_is_an_enterprise_mashup_anyway.html
 |work        = [[PC World (magazine)|PC World]]
 |date        = 2008-04-23
}}</ref>

示例输出(使用当前脚本):

Some languages like those running in a virtual machine (e.g. List of CLI languages/.NET CLI compliant languages in the Common Language Runtime (CLR), and List of JVM languages/JVM compliant languages in the Java Virtual Machine) can share an API.  In this case, a virtual machine enables language interoperability, by abstracting a programming language using an intermediate bytecode and its language bindings.
When used in the context of web development, an API is typically defined as a set of Hypertext Transfer Protocol (HTTP) request messages, along with a definition of the structure of response messages, which is usually in an Extensible Markup Language (XML) or JavaScript Object Notation (JSON) format. While "web API" historically has been virtually synonymous for web service, the recent trend (so-called Web 2.0) has been moving away from Simple Object Access Protocol (SOAP) based web services and service-oriented architecture (SOA) towards more direct representational state transfer (REST) style web resources and resource-oriented architecture (ROA). Part of this trend is related to the Semantic Web movement toward Resource Description Framework (RDF), a concept to promote web-based ontology engineering technologies. Web APIs allow the combination of multiple APIs into new applications known as mashup (web application hybrid)/mashups.

1 个答案:

答案 0 :(得分:2)

由于您只是从字符串中删除内容(即,您始终使用相同的替换模式),因此可以将all放在一个preg_replace中。这样,您只需解析一次字符串。

您可以通过避免延迟量词并删除无用的捕获组来优化您的子模式。

示例:

$str = preg_replace('~{{(?>[^}]++|}(?!}))*+}}|\||\[\[|]]~', '', $str);

将替换你的第二行和三个str_replace

细节:

~            # pattern delimiter
{{           # literal: {{
(?>          # open an atomic group (no backtracking inside, make the pattern fail faster)
    [^}]++   # all characters except } one or more times (possessive: same thing than atomic grouping)
  |          # OR
    }(?!})   # a } not followed by }
)*+          # repeat the atomic group zero or more time (possessive)
}}           # literal: }}
|            # OR
\|           # literal: |
|            # OR
\[\[         # literal: [[
|            # OR
]]           # literal: ]]
~            # pattern delimiter

您现在只需要以相同的方式将子模式1,3,4添加到此模式。请注意,您不需要s修饰符,因为它从不使用点。

关于strip_tags:

您也可以尝试使用子模式:

$str = preg_replace('~<[^>]++>~', '', $str);

但要小心,因为你的代码可能包含几个陷阱,例如:

blah blah blah <!--  blah > --> blah blah
or
<div theuglyattribute=">">

可以避免所有这些问题,但你的模式会变得很长。