Question

我在将字符串相互匹配时遇到了一些麻烦，我想知道是否有人可以伸出援手？

说我有下表：

broken
vector
unidentified
synthetic
artificial

还有第二个数据集，如下所示：

org1    Fish
org2    Amphibian
org3    vector
org4    synthetic species
org5    Mammal

现在，我要删除第二个表中与第一个表中的字符串匹配的所有行，以便输出看起来像这样：

org1    Fish
org2    Amphibian
org5    Mammal

我当时正在考虑在bash中使用grep -v，但是我不太确定如何使其遍历表1中的所有字符串。

我正在尝试在perl中解决它，但是由于某种原因，它返回了我所有的值，而不仅仅是返回匹配的值。有什么想法吗？

我的脚本如下：

#!/bin/perl -w

($br_str, $dataset) = @ARGV;
open($fh, "<", $br_str) || die "Could not open file $br_str/n $!";

while (<$fh>) {
        $str = $_;
        push @strings, $str;
        next;
    }

open($fh2, "<", $dataset) || die "Could not open file $dataset $!/n";

while (<$fh2>) {
    chomp;
    @tmp = split /\t/, $_;
    $groups = $tmp[1];
    foreach $str(@strings){
        if ($str ne $groups){
            @working_lines = @tmp;
            next;
        }
    }
        print "@working_lines\n";
}

Answer 1

chomp输入，并为第一个表使用哈希：

use warnings;
use strict;

my ( $br_str, $dataset ) = @ARGV;
open(my $fh, "<", $br_str ) || die "Could not open file $br_str/n $!";

my %strings;
while (<$fh>) {
    chomp;
    $strings{$_}++;
}

open(my $fh2, "<", $dataset ) || die "Could not open file $dataset $!/n";
while (<$fh2>) {
    chomp;
    my @tmp = split /\s+/, $_;
    my $groups = $tmp[1];
    print "$_\n" unless exists $strings{$groups};
}

请注意，我使用\s+代替了\t，只是为了简化复制/粘贴操作。

从单独的数据集中删除匹配值的数据集中的行

1 个答案: