我有一个包含两列的属性文件。第1列中的字符串匹配需要更改的文件中的字符串。文件2中的字符串必须是文件1列2中的字符串。
我不确定接近这个sed的最佳方法? AWK?只有一个文件1具有每个键和值对,它们都是唯一的。有超过10,000个文件2,每个都是不同的但具有相同的格式,我需要从数字更改为名称。任何文件2中的每个数字都在文件1中。
档案1
1000079541 ALBlai_CCA27168
1000079542 ALBlai_CCA27169
1000082614 PHYsoj_128987
1000082623 PHYsoj_128997
1000112581 PHYcap_Phyca_508162
1000112588 PHYcap_Phyca_508166
1000112589 PHYcap_Phyca_508170
1000112592 PHYcap_Phyca_549547
1000120087 HYAara_HpaP801280
1000134210 PHYinf_PITG_01218T0
1000134213 PHYinf_PITG_01223T0
1000134221 PHYinf_PITG_01231T0
1000144497 PHYinf_PITG_13921T0
1000153541 PYTultPYU1_T002777
1000162512 PYTultPYU1_T013706
1000163504 PYTultPYU1_T014907
1000168326 PHYram_79731
1000168327 PHYram_79730
1000168332 PHYram_79725
1000168335 PHYram_79722
...
文件2
(1000079542:0.60919245567850022205,((1000162512:0.41491233674846345059,(1000153541:0.39076742568979516701,1000163504:0.52813999143574519302):0.14562273102476630537):0.28880212838980307000,(((1000144497:0.20364901110426453235,1000168327:0.22130795712572320921):0.35964649479701132906,((1000120087:0.34990382691181332042,(1000112588:0.08084123331549526725,(1000168332:0.12176200773214326811,1000134213:0.09481932223544080329):0.00945982345360765406):0.01846847662360769429):0.19758412044470402558,((1000168326:0.06182031367986642878,1000112589:0.07837371928562210377):0.03460740736793390532,(1000134210:0.13512192366876615846,(1000082623:0.13344777464787777044,1000112592:0.14943677128375676411):0.03425386814075986885):0.05235436818005634318):0.44112430521695145114):0.21763784827666701749):0.22507080810857052477,(1000112581:0.02102132893524749635,(1000134221:0.10938436290969000275,(1000082614:0.05263067805665807425,1000168335:0.07681947209386902342):0.03562545894572662769):0.02623229853693959113):0.49114147006852687527):0.23017851954961116023):0.64646763541457552549,1000079541:0.90035900920746847476):0.0;
期望的结果
(ALBlai_CCA27169:0.60919245567850022205,((PYTultPYU1_T013706:0.41491233674846345059, ...
答案 0 :(得分:2)
的Python:
import re
# Build a dictionary of replacements:
with open('File 1') as f:
repl = dict(line.split() for line in f)
# Read in the file and make the replacements:
with open('File 2') as f:
data = f.read()
data = re.sub(r'(\d+):',lambda m: repl[m.group(1)]+':',data)
# Write it back out:
with open('File 2','w') as f:
f.write(data)
答案 1 :(得分:0)
全面运行awk解决方案。希望它有所帮助。
awk -F":" 'BEGIN {
while (getline < "file1")
{
split($0,dat," ");
a[dat[1]]=dat[2];
}
}
{
gsub(substr($1,2,length($1)),a[substr($1,2,length($1))],$0); print
}' file2
答案 2 :(得分:-1)
我会在bash中做类似的事情:
while read -r key value
do
echo s/($key:/($value:/g >> sedtmpfile
done < file1
sed -f sedtmpfile file2 > result
rm sedtmpfile