从文件中删除第三次出现的模式

时间:2014-11-07 04:34:22

标签: regex

我需要删除2次以上的模式匹配。

示例输入

5006719,9845861877,"2014-10-01 07:53:26","2014-10-01 11:52:15",Expired
5006720,9845885761,"2014-10-01 07:53:11","2014-10-01 11:52:00",Recieved
5006720,9845885761,"2014-10-01 07:53:26","2014-10-01 11:52:15",Expired
5006720,9845885761,"2014-10-01 07:53:27","2014-10-01 11:52:16",Expired
5006720,9845885761,"2014-10-01 10:36:24","2014-10-01 12:35:13",Expired
5006721,9845888313,"2014-10-01 07:53:11","2014-10-01 11:52:01",Expired
5006721,9845888313,"2014-10-01 07:53:27","2014-10-01 11:52:16",Expired
5006722,9848157771,"2014-10-01 07:53:13","2014-10-01 11:52:02",Expired
5006722,9848157771,"2014-10-01 07:53:28","2014-10-01 11:52:17",Expired
5006722,9848157771,"2014-10-01 07:53:29","2014-10-01 11:52:18",Expired
5006723,9848497273,"2014-10-01 07:53:13","2014-10-01 11:52:03",Expired
5006723,9848497273,"2014-10-01 07:53:29","2014-10-01 11:52:18",Expired
5006723,9848497273,"2014-10-01 07:53:30","2014-10-01 11:52:19",Expired
5006723,9848497273,"2014-10-01 10:36:25","2014-10-01 12:35:14",Expired
5006724,9848788789,"2014-10-01 07:53:14","2014-10-01 11:52:04",Expired
要匹配的

模式是第一列,例如5006719,删除此记录的两个以上的一个。结果集应该是

5006719,9845861877,"2014-10-01 07:53:26","2014-10-01 11:52:15",Expired
5006720,9845885761,"2014-10-01 07:53:11","2014-10-01 11:52:00",Recieved
5006720,9845885761,"2014-10-01 07:53:26","2014-10-01 11:52:15",Expired
5006721,9845888313,"2014-10-01 07:53:11","2014-10-01 11:52:01",Expired
5006721,9845888313,"2014-10-01 07:53:27","2014-10-01 11:52:16",Expired
5006722,9848157771,"2014-10-01 07:53:13","2014-10-01 11:52:02",Expired
5006722,9848157771,"2014-10-01 07:53:28","2014-10-01 11:52:17",Expired
5006723,9848497273,"2014-10-01 07:53:13","2014-10-01 11:52:03",Expired
5006723,9848497273,"2014-10-01 07:53:29","2014-10-01 11:52:18",Expired
5006724,9848788789,"2014-10-01 07:53:14","2014-10-01 11:52:04",Expired

单个条目应保持单一,双重条目应保持双倍,三个条目应剥离为双倍。注意:我们在这里不能匹配整行,只能说明列匹配。

1 个答案:

答案 0 :(得分:1)

不熟悉shell脚本,所以解决了php中的问题:

<?php
$file='sort.csv'; //file containing data
$fileData=fopen($file,'r');
$last = 0; //variable contains last entry
$count = 0; //count of similar occurences

while($row=fgets($fileData)){  //loop through each record
    $data = explode(",", $row);
    if($data['0'] != $last){
        file_put_contents("f1.csv", $row, FILE_APPEND); //output file
        $count = 0;
    }else{
        if($count == 0){
            file_put_contents("f1.csv", $row, FILE_APPEND); //output file
            $count++;
        }
    }
    $last = $data['0'];
}