我有一个制表符分隔的文本文件A(代表BLAST输出)
Name1 BBBBBBBBBBBB 99.40 166 1 0 1 166 334 499 3e-82 302
Name2 DDDDDDDDDDDD 98.80 167 2 0 1 167 346 512 4e-81 298
和文本文件B(代表系统发育树状图)看起来像
"Cluster A": {
"member": {
"Cluster A": "BBBBBBBBBBBB This is Animal A",
},
"name": "Cluster A"
},
"Cluster B: {
"member": {
"Cluster B": "DDDDDDDDDDDD This is Animal B"
},
"name": "cluster B"
}
我想获取文本文件A的第二个选项卡中的字符串(例如DDDDDDDDDDD)并在文本文件B中查找。然后,脚本应该将文本文件B中找到的信息添加到文本文件A的新选项卡中:
Name1 BBBBBBBBBBBB 99.40 166 1 0 1 166 334 499 3e-82 302 Cluster A This is Animal A
Name2 DDDDDDDDDDDD 98.80 167 2 0 1 167 346 512 4e-81 298 Cluster B This is Animal B
非常感谢!
答案 0 :(得分:0)
一些示例代码,用于从两个文件中读取数据 您的示例缺少外部{},这将解析代码添加它的原因。
然后循环集群成员并构建所需的结果
import json
import re
with open("in1") as blast:
blast_data = blast.readlines()
with open("in2") as jsonfile:
json_data = json.loads("{%s}" % jsonfile.read())
for bdata in blast_data:
id = bdata.split()[1]
for cluster in json_data:
for member in json_data[cluster]['member']:
if id in json_data[cluster]['member'][member]:
print "%s %s %s" % (bdata.strip(), member, re.sub(id, '', json_data[cluster]['member'][member]))
break
答案 1 :(得分:0)
修复json文件:
$ cat B
[
{ "Cluster A": { "member": { "Cluster A": "BBBBBBBBBBBB This is Animal A" }, "name": "Cluster A" } },
{ "Cluster B": { "member": { "Cluster B": "DDDDDDDDDDDD This is Animal B" }, "name": "cluster B" } }
]
然后,perl解决方案:
perl -MJSON -MPath::Class -E '
my $data = decode_json file("B")->slurp;
$, = "\t";
for my $line (file("A")->slurp(chomp => 1)) {
my @F = split /\t/, $line;
for my $item (@$data) {
for my $cluster (keys %$item) {
while (my ($key, $value) = each %{$item->{$cluster}{member}} ) {
if ($value =~ /$F[1]\s+(.*)/) {
say $line, $cluster, $1;
}
}
}
}
}
'
输出
Name1 BBBBBBBBBBBB 99.40 166 1 0 1 166 334 499 3e-82 302 Cluster A This is Animal A
Name2 DDDDDDDDDDDD 98.80 167 2 0 1 167 346 512 4e-81 298 Cluster B This is Animal B
对于踢,等效的Ruby
ruby -rjson -e '
data = JSON.load File.new("B")
File.readlines("A").each {|line|
line.chomp!
f = line.split("\t")
data.each {|obj|
obj.each_key {|cluster|
obj[cluster]["member"].each_pair {|key, value|
if m = value.match(f[1] + "\s+(.*)")
puts [line, cluster, m[1]].join("\t")
end
}
}
}
}
'
答案 2 :(得分:0)
Shell脚本代码,
#!/usr/bin/ksh
awk '{print $2}' file1 > tmpfile
for i in `cat tmpfile`
do
{
aa=`grep -w $i file2`
awk -v out="$aa" -v pattern="$i" ' $2 ~ pattern { print $0" "out}' file1}
done
awk '{print $2}' file1 > tmpfile
- 从第一个文件中获取模式并存储在tmp文件中
aa=grep -w $i file2
- 匹配文件2中的类似模式,并将整行存储在变量aa中
awk -v out="$aa" -v pattern="$i" ' $2 ~ pattern { print $0" "out}' file1}
- 将file2中的字符串追加到其对应的file1