Question

我有一个包含两列的属性文件。第1列中的字符串匹配需要更改的文件中的字符串。文件2中的字符串必须是文件1列2中的字符串。

我不确定接近这个sed的最佳方法？ AWK？只有一个文件1具有每个键和值对，它们都是唯一的。有超过10,000个文件2，每个都是不同的但具有相同的格式，我需要从数字更改为名称。任何文件2中的每个数字都在文件1中。

档案1

1000079541  ALBlai_CCA27168
1000079542  ALBlai_CCA27169
1000082614  PHYsoj_128987
1000082623  PHYsoj_128997
1000112581  PHYcap_Phyca_508162
1000112588  PHYcap_Phyca_508166
1000112589  PHYcap_Phyca_508170
1000112592  PHYcap_Phyca_549547
1000120087  HYAara_HpaP801280
1000134210  PHYinf_PITG_01218T0
1000134213  PHYinf_PITG_01223T0
1000134221  PHYinf_PITG_01231T0
1000144497  PHYinf_PITG_13921T0
1000153541  PYTultPYU1_T002777
1000162512  PYTultPYU1_T013706
1000163504  PYTultPYU1_T014907
1000168326  PHYram_79731
1000168327  PHYram_79730
1000168332  PHYram_79725
1000168335  PHYram_79722
...

文件2

(1000079542:0.60919245567850022205,((1000162512:0.41491233674846345059,(1000153541:0.39076742568979516701,1000163504:0.52813999143574519302):0.14562273102476630537):0.28880212838980307000,(((1000144497:0.20364901110426453235,1000168327:0.22130795712572320921):0.35964649479701132906,((1000120087:0.34990382691181332042,(1000112588:0.08084123331549526725,(1000168332:0.12176200773214326811,1000134213:0.09481932223544080329):0.00945982345360765406):0.01846847662360769429):0.19758412044470402558,((1000168326:0.06182031367986642878,1000112589:0.07837371928562210377):0.03460740736793390532,(1000134210:0.13512192366876615846,(1000082623:0.13344777464787777044,1000112592:0.14943677128375676411):0.03425386814075986885):0.05235436818005634318):0.44112430521695145114):0.21763784827666701749):0.22507080810857052477,(1000112581:0.02102132893524749635,(1000134221:0.10938436290969000275,(1000082614:0.05263067805665807425,1000168335:0.07681947209386902342):0.03562545894572662769):0.02623229853693959113):0.49114147006852687527):0.23017851954961116023):0.64646763541457552549,1000079541:0.90035900920746847476):0.0;

期望的结果

(ALBlai_CCA27169:0.60919245567850022205,((PYTultPYU1_T013706:0.41491233674846345059, ...

Answer 1

的Python：

import re

# Build a dictionary of replacements:
with open('File 1') as f:
    repl = dict(line.split() for line in f)

# Read in the file and make the replacements:
with open('File 2') as f:
    data = f.read()
data = re.sub(r'(\d+):',lambda m: repl[m.group(1)]+':',data)

# Write it back out:
with open('File 2','w') as f:
    f.write(data)

Answer 2

全面运行awk解决方案。希望它有所帮助。

awk -F":" 'BEGIN {
  while (getline < "file1")
  {
    split($0,dat," ");
    a[dat[1]]=dat[2];
  }
}
{
  gsub(substr($1,2,length($1)),a[substr($1,2,length($1))],$0); print
}' file2

Answer 3

我会在bash中做类似的事情：

while read -r key value
do
  echo s/($key:/($value:/g >> sedtmpfile
done < file1
sed -f sedtmpfile file2 > result
rm sedtmpfile

我需要使用另一个文件中的键值paris替换一个文件中的字符串

3 个答案: