比较两个列表并找到相似之处

时间:2011-08-24 19:03:52

标签: python perl awk

我有一个像这样的列表:

C
E

我想在下表(表1)中找到这些并将它们写入第二个表(表2)

有没有人有python或perl脚本来执行此操作?

表1:

A   MU_ADO_2    1099    MU_ADO_2.1099   o   o   o   o   o   o   o   o   o   o   7.82436 s_3_merged  Suseptible  A   AG  2   4   0   2   0                                                                               
A   MU_ADO_2    1105    MU_ADO_2.1105   327.008 s_2_merged  Resistance  G   GT  81  0   2   132 79  31.5281 s_6_merged  Resistance  G   GT  8   0   1   8   7   34.9813 s_3_merged  Suseptible  G   GT  7   0   0   3   7   7.82436 s_7_merged  Suseptible  G   GT  2   0   0   4   2
A   MU_ADO_2    1110    MU_ADO_2.1110   515.963 s_2_merged  Resistance  A   AT  113 96  1   2   110 31.5281 s_6_merged  Resistance  A   AT  7   8   0   0   7   16.3388 s_3_merged  Suseptible  A   AT  4   7   0   0   4   13.808  s_7_merged  Suseptible  A   AT  3   3   0   0   3
A   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
B   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
B   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
B   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
F   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
F   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
F   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
F   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
F   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0

表2:

C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0

5 个答案:

答案 0 :(得分:3)

由于您添加了标记,我假设您对其他* nix实用程序开放,这里有一个sed解决方案:

sed '/^[^CE]/d' table1.txt > table2.txt

这将删除table1.txt中不以C或E开头的所有行。

答案 1 :(得分:3)

grep

怎么样?
grep -e '^[CE]' source.file

您也可以将其重定向到新文件中:

grep -e '^[CE]' source.file > dest.file

答案 2 :(得分:1)

如果您的问题是:“如果过滤此文件只能查看第一个字段等于CE的条目?”

然后以下内容应该有效:

awk '$1 ~ /[CE]/ { print $0 }' yourfile > outfile

如果您想以牺牲清晰度为代价来保存一些击键,以下内容也适用:

awk '$1 ~ /[CE]/' yourfile > outfile

答案 3 :(得分:1)

替代方案,在python中:

keys = ['C', 'E']
with open('out.txt', 'a') as out:
    with open('test.txt') as f:
        for line in f:
            for key in keys:
                if line.startswith(key):
                    out.write(line)
                    break

test.txt是一个包含表格1的文件,复制粘贴 out.txt是您获取表格2的文件

答案 4 :(得分:0)

假设“C E”列表来自文件:

awk '
    FILENAME == ARGV[1] {list[$1]; next}
    $1 in list {print}
' list.txt table1 > table2