使用PHP删除.txt文件中的重复行

时间:2017-07-24 09:49:58

标签: php unix for-loop fopen

我有多个包含目录的txt文件。文本文件都包含相同的标题。我正在读取所有txt文件并将其全部输出到一个文件。

由于每个单独的文件包含相同的标题,因此它将所有文件插入到新的合并文件中。如何删除新合并文件中的所有标题,并将其保留在顶部?

我一直在查看unix中的sort命令。

sort filename | uniq

此命令有效,但会删除所有其他重复数据。无论如何只删除特定的字符串"这是一个标题"但是留在顶部?

当前代码

$header = array( "XX-XXXXXXXXX-XXXXXXX-X        XXXXXXXXXXXX" );


$files = glob( "/path/to/folder/*.txt" );

$output_file = "newfile_".date( "YmdHis" ).".txt";

$out = fopen( $output_file, "w" );

foreach( $header as $inputHeader ) {

    fwrite( $out, $inputHeader );
}

    foreach( $files as $file ) {

        $in = fopen( $file, "r" );

            while ( $line = fgets( $in ) ) {

                if( $header !== $line ) {

                    fwrite( $out, $line );

                }

            }

        fclose( $in );

     }

fclose( $out );

多次重复的行 This is the duplicate

4 个答案:

答案 0 :(得分:1)

尝试在写作开始时输入标题,然后在阅读这些行时检查它

//cache our header lines
$header = "Header line";

$files = glob( "/path/to/files*.txt" );

//print_r($files);

$output_file = "newfile".date( "YmdHis" ).".txt";

$out = fopen( $output_file, "w" );

//input the header line at the top of our new file

fwrite( $out, $header);



foreach( $files as $file ) {

    $in = fopen( $file, "r" );

        while ( $line = fgets( $in ) ) {
            //header check, dont output header lines to new file
            if($header !== preg_replace('/\s+/', '', $line)){
                 fwrite( $out, $line );
            }
        }

    fclose( $in );
}

fclose( $out );

答案 1 :(得分:1)

创建新文件后,添加此行将删除重复的行

$lines = array_unique(file("your_file.txt"));

答案 2 :(得分:1)

所以我设法在@ WillParky93的帮助下修复了这个问题。我在文件中有4个不同的标题,其中包含所有标题的副本。在与逻辑运算符一起玩之后。

最终代码

//the headers that were in the file with duplicates
$header1 = "DD-LLDRHD045-UHSTAYL-MR        LOCKFMDLA111;
$header2 = "DD-LLDRHD045-UHSTAYL-MR        LOCKFMDLA222";
$header3 = "DD-LLDRHD045-UHSTAYL-MR        LOCKFMDLA333";
$header4 = "DD-LLDRHD045-UHSTAYL-MR        LOCKFMDLA444";

//get all the files to be merged
$files = glob( "/PATH/TO/FILES/*.txt" );

//set the output filename
$output_file = "NewFile".date( "YmdHis" ).".txt";

//open the output file
$out = fopen( $output_file, "w" );

    //loop through the files to be merged
    foreach( $files as $file ) {

        //open each file
        $in = fopen( $file, "r" );

            //while each line in each file
            while ( $line = fgets( $in ) ) {

                //if the current line is not equal to header1, header2, header3 or header4
                if( preg_replace('/\s+/', '', $line ) !=
                    preg_replace('/\s+/', '', $header1 )&&
                    preg_replace('/\s+/', '', $line ) !=
                    preg_replace('/\s+/', '', $header2 )&&
                    preg_replace('/\s+/', '', $line ) !=
                    preg_replace('/\s+/', '', $header3 )&&
                    preg_replace('/\s+/', '', $line ) !=
                    preg_replace('/\s+/', '', $header4 ) ) {

                       //write that line to the output file
                       fwrite( $out, $line );

                       //echo $line."\n";

                }else{
                       //write blank line to the file
                       fwrite( $out, "\n" );

                  }

            }
        //close the file
        fclose( $in );

     }
//close the output file
fclose( $out );

//get the contents of the output file
$header1 .= file_get_contents( $output_file );

//add the header to the top of the output file
file_put_contents( $output_file, $header1 );

答案 3 :(得分:0)

如果文件只有1个标题

$header_exist = false;

foreach($files as $file) {

  $in = fopen($file, "r");

  while($line = fgets($in)) {
    if(strpos($line, "This is a header") === false) {
      fwrite($out, $line);
    }
    else {
      if($header_exist === false) {
        $header_exist = true;
        fwrite($out, $line);
      }
    }
  }
  fclose($in);
}