我想根据名为list.txt的另一个文件的内容从名为data.txt的文件中提取数据。如果需要,我需要从data.txt中提取$ 11 data.txt中提供$ 1和$ 2的list.txt。 list.txt和$ 4 of data.txt的$ 2是相同的。
contents of list.txt
2aas p0877
asds k9876
651a kl098
contents of data.txt
2aas F DNK_ECTHA Q9XT6 12-208 192.0 250.0 198.0 104.00 78.80 99.0 108.0 97 5
asds G DNK_DROME k9876 12-209 192.0 250.0 197.0 100.00 78.80 87.0 100.0 97 6
1ot3 H DNK_DROME Q9bt6 11-208 142.0 256.0 194.0 106.00 78.80 97.0 100.0 97 5
651a H DNK_ECTHA kl098 10-208 192.0 259.0 197.0 100.00 78.80 98.0 100.0 99 5
2aas H pyp_DROME p0877 12-208 192.0 250.0 130.0 102.00 78.80 67.0 103.0 97 9
desired output
2aas p0877 67.0
asds k9876 87.0
651a kl098 98.0
答案 0 :(得分:1)
我假设data.txt
包含您希望使用list.txt
这是使用python快速而肮脏的方法:
# Create a data dict using data.txt
with open("data.txt") as f:
# create generator of entries using non-empty lines in file
entries = (line.split() for line in f if line.strip())
# create dict using ($1,$4) as key and $11 as value
data = dict(((d[0], d[3]), d[10]) for d in entries)
# for each entry in list.txt, print out matching data
with open("list.txt") as f:
entries = (tuple(line.split()) for line in f if line.strip())
for e in entries:
if e in data:
print e[0], e[1], data[e]
在与文件相同的目录中运行它:
[me@home]$ python extract.py
2aas p0877 67.0
asds k9876 87.0
651a kl098 98.0
或者,对于awk
解决方案:
[me@home]$ awk 'FILENAME==ARGV[1] {pair[$1" "$4] = $11; next} ($1" "$2 in pair) {printf("%s\t%s\t%s\n", $1, $2, pair[$1" "$2])}' data.txt list.txt
2aas p0877 67.0
asds k9876 87.0
651a kl098 98.0