从文本文件中提取列

时间:2013-02-07 15:32:42

标签: python awk

我想根据名为list.txt的另一个文件的内容从名为data.txt的文件中提取数据。如果需要,我需要从data.txt中提取$ 11  data.txt中提供$ 1和$ 2的list.txt。 list.txt和$ 4 of data.txt的$ 2是相同的。

contents of list.txt

2aas   p0877
asds   k9876
651a   kl098

contents of data.txt

2aas    F   DNK_ECTHA   Q9XT6   12-208  192.0   250.0   198.0   104.00  78.80   99.0    108.0   97  5
asds    G   DNK_DROME   k9876   12-209  192.0   250.0   197.0   100.00  78.80   87.0    100.0   97  6
1ot3    H   DNK_DROME   Q9bt6   11-208  142.0   256.0   194.0   106.00  78.80   97.0    100.0   97  5
651a    H   DNK_ECTHA   kl098   10-208  192.0   259.0   197.0   100.00  78.80   98.0    100.0   99  5
2aas    H   pyp_DROME   p0877   12-208  192.0   250.0   130.0   102.00  78.80   67.0    103.0   97  9

desired output

2aas   p0877   67.0
asds   k9876   87.0
651a   kl098   98.0

1 个答案:

答案 0 :(得分:1)

我假设data.txt包含您希望使用list.txt

中的条目“查询”的数据列表

这是使用python快速而肮脏的方法:

# Create a data dict using data.txt
with open("data.txt") as f:
  # create generator of entries using non-empty lines in file
  entries = (line.split() for line in f if line.strip())
  # create dict using ($1,$4) as key and $11 as value
  data = dict(((d[0], d[3]), d[10]) for d in entries)

# for each entry in list.txt, print out matching data
with open("list.txt") as f:
  entries = (tuple(line.split()) for line in f if line.strip())
  for e in entries:
    if e in data:
        print e[0], e[1], data[e]

在与文件相同的目录中运行它:

[me@home]$ python extract.py 
2aas p0877 67.0
asds k9876 87.0
651a kl098 98.0

或者,对于awk解决方案:

[me@home]$ awk 'FILENAME==ARGV[1] {pair[$1" "$4] = $11; next} ($1" "$2 in pair) {printf("%s\t%s\t%s\n", $1, $2, pair[$1" "$2])}' data.txt list.txt
2aas    p0877   67.0
asds    k9876   87.0
651a    kl098   98.0