修复PHP解析纯文本?

时间:2013-06-25 08:31:32

标签: php regex parsing csv plaintext

我有一个PHP脚本,我用它来解析一些纯文本到CSV格式。

<?php
  $text = "1. Bonus: Name some things about US history. For 10 points each:
[10] Name the first president of the United States of America.
ANSWER: George Washington
[10] How many original colonies were there?
ANSWER: 13
[10] How many states exist today?
ANSWER: 50";


function text_to_csv( $text = null ) {
  $lines  = explode( "\n", $text );
  $data   = array();
  $temp   = array();
  foreach( $lines as $line ) {
    $line = trim( $line );
    if ( empty( $line ) ) {
      continue;
    }    
    if ( preg_match( '/^\[10\](.+?)$/', $line, $quest ) ) {
        $temp[] =  trim( $quest[0] );
        continue;
    }
    if ( preg_match( '/^([0-9]+)\.(.+?)$/', $line, $quest ) ) {
      $temp[] = trim( $quest[1] );
      $temp[] = trim( $quest[2] );
      continue;
    }
    if ( preg_match( '/^ANSWER\:(.+?)$/', $line, $quest ) ) {
      $temp[] = trim( $quest[1] );
      $data[] = "|".implode( '|,|', $temp )."|";
      $temp = array();
    }

    }

  return implode( "\r\n", $data );
}

echo text_to_csv( $text );
?>

返回:

|1|,|Bonus: Name some things about US history. For 10 points each:|,|[10] Name the first president of the United States of America.|,|George Washington|
|[10] How many original colonies were there?|,|13|
|[10] How many states exist today?|,|50|

第二个和第三个[10]分开,并不与第一个重合。我想要的输出是:

|1|,|Bonus: Name some things about US history. For 10 points each:|,|[10] Name the first president of the United States of America.|,|George Washington|,|[10] How many original colonies were there?|,|13|,|[10] How many states exist today?|,|50|

整个字符串全部在一行上,并以逗号分隔。我认为正在发生的事情是脚本将第二个和第三个[10]视为新条目而不是连接到前一个数组。任何人都可以帮我解决这个问题。非常感谢!

2 个答案:

答案 0 :(得分:1)

某些文字具有简单回车字符\r,其他文字具有换行符\n,其他文字具有回车符和换行符\r\n。这取决于用于创建文本的编辑器。

您需要涵盖这些可能的情况。这样做:

return implode("\r",implode("\n",implode("\r\n",$data)));

答案 1 :(得分:0)

您可以在不使用implode甚至临时数组的情况下执行此操作,只需使用字符串连接即可。最有可能更快,但最重要的是你。

<?php
$text = "1. Bonus: Name some things about US history. For 10 points each:
[10] Name the first president of the United States of America.
ANSWER: George Washington
[10] How many original colonies were there?
ANSWER: 13
[10] How many states exist today?
ANSWER: 50";

function text_to_csv( $text = null ){
    $data   = null;
    $lines  = explode("\n",trim($text));

    foreach($lines as $line)
    {
        $line = trim($line);
        if(empty($line))
        {
            continue;
        }
        if(preg_match('/^\[10\](.+?)$/', $line, $quest))
        {
            $data .= "|".trim( $quest[0] )."|,";
        }
        if(preg_match('/^([0-9]+)\.(.+?)$/', $line, $quest))
        {
            $data .= "|".trim( $quest[1] )."|,";
            $data .= "|".trim( $quest[2] )."|,";
        }
        if(preg_match('/^ANSWER\:(.+?)$/', $line, $quest))
        {
            $data .= "|".trim( $quest[1] )."|,";
        }
    }
    return rtrim($data, ',');
}

echo text_to_csv($text);

/*
|1|,|Bonus: Name some things about US history. For 10 points each:|,|[10] Name the first president of the United States of America.|,|George Washington|,|[10] How many original colonies were there?|,|13|,|[10] How many states exist today?|,|50|
*/
?>