
时间:2012-06-16 20:38:20

标签: php newline platform


参考:这是一个自我回答的问题。它旨在分享知识,Q& A风格。

如何在PHP中检测end of line字符的类型?


6 个答案:

答案 0 :(得分:8)

 * Detects the end-of-line character of a string.
 * @param string $str The string to check.
 * @param string $default Default EOL (if not detected).
 * @return string The detected EOL, or default one.
function detectEol($str, $default=''){
    static $eols = array(
        "\0x000D000A", // [UNICODE] CR+LF: CR (U+000D) followed by LF (U+000A)
        "\0x000A",     // [UNICODE] LF: Line Feed, U+000A
        "\0x000B",     // [UNICODE] VT: Vertical Tab, U+000B
        "\0x000C",     // [UNICODE] FF: Form Feed, U+000C
        "\0x000D",     // [UNICODE] CR: Carriage Return, U+000D
        "\0x0085",     // [UNICODE] NEL: Next Line, U+0085
        "\0x2028",     // [UNICODE] LS: Line Separator, U+2028
        "\0x2029",     // [UNICODE] PS: Paragraph Separator, U+2029
        "\0x0D0A",     // [ASCII] CR+LF: Windows, TOPS-10, RT-11, CP/M, MP/M, DOS, Atari TOS, OS/2, Symbian OS, Palm OS
        "\0x0A0D",     // [ASCII] LF+CR: BBC Acorn, RISC OS spooled text output.
        "\0x0A",       // [ASCII] LF: Multics, Unix, Unix-like, BeOS, Amiga, RISC OS
        "\0x0D",       // [ASCII] CR: Commodore 8-bit, BBC Acorn, TRS-80, Apple II, Mac OS <=v9, OS-9
        "\0x1E",       // [ASCII] RS: QNX (pre-POSIX)
        //"\0x76",       // [?????] NEWLINE: ZX80, ZX81 [DEPRECATED]
        "\0x15",       // [EBCDEIC] NEL: OS/390, OS/400
    $cur_cnt = 0;
    $cur_eol = $default;
    foreach($eols as $eol){
        if(($count = substr_count($str, $eol)) > $cur_cnt){
            $cur_cnt = $count;
            $cur_eol = $eol;
    return $cur_eol;


  • 需要检查编码类型
  • 需要以某种方式知道我们可能在ZX8x之类的奇异系统上(因为ASCII x76是常规字母) @radu提出了一个好点,在我的情况下,它不值得努力很好地处理ZX8x系统。
  • 我应该将该功能拆分为两个吗? mb_detect_eol()(多字节)和detect_eol()

答案 1 :(得分:6)

更换everything except new lines using regex会不会更容易?


点匹配单个字符,而不关心该字符是什么。 唯一的例外是换行符。


$string = 'some string with new lines';
$newlines = preg_replace('/.*/', '', $string);
// $newlines is now filled with new lines, we only need one
$newline = substr($newlines, 0, 1);


enter image description here

答案 2 :(得分:3)

这里已经给出的答案为用户提供了足够的信息。 以下代码(基于已经给出的答案)可能会有所帮助:

  • 它提供了找到的EOL
  • 的参考
  • 检测还设置了一个应用程序可用于此参考的密钥。
  • 它显示了如何在实用程序类中使用引用。
  • 显示如何使用它来检测返回找到的EOL的密钥名称的文件。
  • 我希望这对大家都有用。

    Newline characters in different Operating Systems
    The names given to the different sequences are:
    NewL  Chars       Name     Description
    ----- ----------- -------- ------------------------------------------------------------------
    LF    0x0A        UNIX     Apple OSX, UNIX, Linux
    CR    0x0D        TRS80    Commodore, Acorn BBC, ZX Spectrum, TRS-80, Apple II family, etc
    LFCR  0x0A 0x0D   ACORN    Acorn BBC and RISC OS spooled text output.
    CRLF  0x0D 0x0A   WINDOWS  Microsoft Windows, DEC TOPS-10, RT-11 and most other early non-Unix
                              and non-IBM OSes, CP/M, MP/M, DOS (MS-DOS, PC DOS, etc.), OS/2,
    ----- ----------- -------- ------------------------------------------------------------------
    const EOL_UNIX    = 'lf';        // Code: \n
    const EOL_TRS80   = 'cr';        // Code: \r
    const EOL_ACORN   = 'lfcr';      // Code: \n \r
    const EOL_WINDOWS = 'crlf';      // Code: \r \n


    Detects the end-of-line character of a string.
    @param string $str      The string to check.
    @param string $key      [io] Name of the detected eol key.
    @return string The detected EOL, or default one.
    public static function detectEOL($str, &$key) {
       static $eols = array(
         Util::EOL_ACORN   => "\n\r",  // 0x0A - 0x0D - acorn BBC
         Util::EOL_WINDOWS => "\r\n",  // 0x0D - 0x0A - Windows, DOS OS/2
         Util::EOL_UNIX    => "\n",    // 0x0A -      - Unix, OSX
         Util::EOL_TRS80   => "\r",    // 0x0D -      - Apple ][, TRS80
      $key = "";
      $curCount = 0;
      $curEol = '';
      foreach($eols as $k => $eol) {
         if( ($count = substr_count($str, $eol)) > $curCount) {
            $curCount = $count;
            $curEol = $eol;
            $key = $k;
      return $curEol;
    }  // detectEOL


    Detects the EOL of an file by checking the first line.
    @param string  $fileName    File to be tested (full pathname).
    @return boolean false | Used key = enum('cr', 'lf', crlf').
    @uses detectEOL
    public static function detectFileEOL($fileName) {
       if (!file_exists($fileName)) {
         return false;
       // Gets the line length
       $handle = @fopen($fileName, "r");
       if ($handle === false) {
          return false;
       $line = fgets($handle);
       $key = "";
       <Your-Class-Name>::detectEOL($line, $key);
       return $key;
    }  // detectFileEOL


    答案 3 :(得分:2)


    function detect_newline_type($content) {
        $arr = array_count_values(
                       ' ',
                           '\1 ',
        return key($arr);







    1. 用任何东西替换/[^\r\n]*/将“起作用”以使文本消失,但是一旦我们想要一个分隔符就会出现问题(因为我们删除所有字符但是换行符,任何字符都不是'换行将是有效的分隔符)。因此,想要创建与换行符匹配,并在替换中使用该匹配的反向引用。
    2. 在内容中,可能会有多个换行符连续出现。但是,我们不希望在这种情况下对它们进行分组,因为它们将被其余代码视为不同类型的换行符。这就是为什么新线的列表在反向引用的匹配中明确说明。

    答案 4 :(得分:1)


    这可以为EOL返回一个或两个字符,如LF,CR + LF ..

      $eols = array_count_values(str_split(preg_replace("/[^\r\n]/", "", $string)));
      $eola = array_keys($eols, max($eols));
      $eol = implode("", $eola);

    答案 5 :(得分:0)

    如果您只关心LF / CR,这是我写的一种方法。无需处理您从未见过的所有可能的文件情况。

     * @param  string  $path
     * @param  string  $format  real or human_readable
     * @return false|string
     * @author Sorin-Iulian Trimbitas
    public static function getLineBreak(string $path, $format = 'real')
        // Hopefully my idea is ok, the rest of the stuff from the internet doesn't seem to work ok in some cases
        // 1. Take the first line of the CSV
        $file = new \SplFileObject($path);
        $line = $file->getCurrentLine();
        // Do we have an empty line?
        if (mb_strlen($line) == 1) {
            // Try the next line
            $line = $file->getCurrentLine();
            if (mb_strlen($line) == 1) {
                // Give up
                return false;
        // What does we have at its end?
        $last_char = mb_substr($line, -1);
        $penultimate_char = mb_substr($line, -2, 1);
        if ($last_char == "\n" || $last_char == "\r") {
            $real_format = $last_char;
            if ($penultimate_char == "\n" || $penultimate_char == "\r") {
                $real_format = $penultimate_char.$real_format;
            if ($format == 'real') {
                return $real_format;
            return str_replace(["\n", "\r"], ['LF', 'CR'], $real_format);
        return false;