Question

我删除一些含有相同单词的行时遇到问题。我尝试了很多reg exp但没有效果。例如：

B005XJ4PXG  667
B00008W5TT  1111
B005XIF874  919
B00008W5TT  1305
B00008W5TT  1350
B0000B31MK  918
B0000B31MK  1340

我的文字文件很大，所以我有很多不同的重复单词。我需要的是删除重复项并在右侧留下最高值的行。

示例：

B0000B31MK  918
B0000B31MK  1340

删除B0000B31MK 918行。

Answer 1

这是一个小型的perl脚本来完成这项工作：

#!/usr/bin/perl
use strict;
use warnings;

my %uniq;
# open input file
open my $fh_in, '<', 'input_file.txt' or die $!;
# read the file line by line until the end of file
while(<$fh_in>){
    # remove line break
    chomp;
    # split on space
    my ($word, $val) = split;
    # populate the hash, key is the word, the bigest value will be kept at the end
    $uniq{$word} = $val if !exists $uniq{$word} or $uniq{$word} < $val;
}
# open output file
open my $fh_out, '>', 'output_file.txt' or die $!;
# foreach word, value pair
while(my($w,$v)=each%uniq) {
    # print the pair in the output file
    print $fh_out "$w\t$v\n";
}

<强>用法：

<强> input_file.txt

B005XJ4PXG  667
B00008W5TT  1111
B005XIF874  919
B00008W5TT  1305
B00008W5TT  1350
B0000B31MK  918
B0000B31MK  1340

运行脚本：

$perl test.pl

<强> output_file.txt：

B00008W5TT  1350
B0000B31MK  1340
B005XIF874  919
B005XJ4PXG  667

Answer 2

如果我更正，您就在Windows上。如果您可以使用此youtube tutorial安装Pandas，则可以在以下几行中执行此操作：

#include <stdio.h>
#include <string.h>

int main() {
    char str1[100], str2[100];
    int i;
    scanf("%s", str1);
    for (i = 0; str1[i] != '\0'; ++i) {
        str2[i] = str1[i];
    }
    str2[i] = '\0';
    printf("%s\n", str2);
    return 0;
}

Notepad ++按重复单词删除行

2 个答案: