在PHP打开或关闭标记之前检查空行

时间:2014-12-06 20:22:50

标签: regex wordpress search blank-line

我的WordPress网站出现错误(XML解析错误),因为<DOCTYPE>之前有一个空行。可能这是由PHP开始标记<?php之前或结束标记?>之前的主题或插件文件中的空行引起的。我已经检查了一些文件(主题index.phpheader.phpfunctions.php和一些插件),但没有找到原因。

在php标记之前或之后是否有智能技巧检查所有文件是否有空行?有些正则表达式可能吗?或者以其他方式检查哪个主题文件或插件文件输出该行?

3 个答案:

答案 0 :(得分:4)

我不认为只是

  • DOS / Windows行终止 - 回车 \ r 加上换行 \ n 对,或
  • UNIX行终止 - 只有换行符 \ n
文件顶部的

是问题所在。那些空格字符通常被忽略。

我认为您已经在开始时将文件创建为带有byte order mark(BOM)的UTF-8编码文件。文本编辑器和IDE不显示Unicode编码文件的BOM。

UTF-8 BOM是0xEF 0xBB 0xBF,在Windows-1252代码页中显示为,如果文本编辑器显示它们。文本编辑器UltraEdit允许使用文件 - 打开覆盖自动Unicode检测,并在打开为选项的文件打开对话框 ASCII 中选择打开UTF-8编码的文件作为ASCII / ANSI文件。在文本编辑模式下,也可以看到带有BOM的UTF-8编码Unicode文件开头的UTF-8 BOM。

在顶部查找带有UTF-8 BOM的文件的非常简单的搜索是搜索包含字符串的文件。或者,如果您不想依赖代码页,请使用表达式\xEF\xBB\xBF运行Perl正则表达式搜索。

使用空字符串作为替换字符串应该会导致从所有文件中删除UTF-8 BOM。

\R可用于匹配DOS / Windows或UNIX或MAC线路终端。换句话说,\R相当于(?:\r\n|\n|\r)或更短(?:\r?\n|\r)

但是,由于我的字节顺序标记怀疑我建议用作搜索字符串

(?:\xEF\xBB\xBF\s*|\s+)(?=<\?php)

说明:

(?: ... ) ... OR表达式的非标记组。

\xEF\xBB\xBF\s* ...附加了零个或多个whitespaces的UTF-8 BOM。

| ...表示OR。

\s+ ...空格字符一次或多次。

(?=<\?php) ...一个积极的预测,检查下一个字符是<?php而不是真正匹配它们。

该搜索字符串不限于文件的开头。但是,或许它足以满足您的需求,在PHP文件的开头找到带有UTF-8 BOM或空白行的文件。

答案 1 :(得分:1)

Generally this issue is seen in Wordpress-generated XML documents such as RSS and atom feeds as well as XML sitemaps. In such cases the bug is not an anomalous BOM in the UTF-8 document, but rather an issue caused by PHP's propensity to consider everything following its closing '?>' as data to be sent to output. A blank line following the closing '?>' tag will be interpreted as an instruction to send a LF to the output document. If this happens before the document itself is buffered, the result is a XML document with a LF (blank line) before the xml declaration, rendering it invalid XML. You will then see something like this when you examine the xml output in a browser:

This page contains the following errors:

error on line 2 at column 6: XML declaration allowed only at the start of the document

The recommended solution is to look through all of the PHP files in the Wordpress theme, see if any closing '?>' PHP tags present have line feeds or carriage returns following them, and remove them for the fix. Unfortunately this is easier said than done, considering the number of files in the theme as well as the core Wordpress install, any one of which could host the bug.

My original solution was a small Perl script that checked every PHP file under /usr/share/wordpress for this issue. However I later found a very elegant PHP-only solution by Michal "Wejn" Jirků at http://wejn.org/stuff/wejnswpwhitespacefix.php.html, with additional debugging info contributed by Eric Auer. The authors provide a small script (wejnswpwhitespacefix.php) with a function that inserts itself into the output chain when called, and parses all content delivered to it for valid headers. If valid content is found, the script creates a new PHP output buffer by calling ob_start() and buffers this content for eventual output. The crux of this solution is the PHP ob_start function, which creates a new output buffer when called. PHP output buffers are stackable and are nested, so that actual output happens in the order of creation of the buffers. If the content is invalid, such as a single linefeed, it is rejected.

As the actual extra LF bug can happen anywhere in the output chain from the theme's own PHP files (typically functions.php) through index.php or up the chain to the the core WP files such as wp-settings.php, wp-config.php, wp-load.php etc., the recommendation is to insert the file at each stage to see if it solves the issue. If it does, that means the error lies in that stage, so it becomes much simpler to locate the offending whitespace and fix it. This is in general a much better way to resolve the issue than to just insert the file somewhere where it works and leave it there, as in that case the issue is not being fixed but rather worked around.

答案 2 :(得分:0)

我在Netbeans中使用了“ \?> \ s * \ Z” [删除引号]在文件末尾找到多余的行。

Noel