Question

我有一个标签，文件A，像这样

establishment_of_protein_localization_to_endoplasmic_reticulum  GO:0072599
    lipid_oxidation GO:0034440
    endocytic_vesicle_lumen GO:0071682
    monocarboxylic_acid_metabolic_process   GO:0032787
    protein_transmembrane_transport GO:0071806
    cellular_response_to_topologically_incorrect_protein    GO:0035967
    preribosome GO:0030684
    negative_regulation_of_hematopoietic_progenitor_cell_differentiation    GO:1901533

和第二个文件结构：

font-family: Helvetica;
font-size: 10.86px;
font-weight: 700;
text-anchor: middle;
fill: #000000;
stroke: none;">
GO:0072599
</text>

<text x="509.10" y="-243.88"

style="
font-family: Helvetica;
font-size: 10.72px;
font-weight: 700;
text-anchor: middle;
fill: #000000;
stroke: none;">
GO:0034440
</text>

我希望使用awk或sed将文件a的第二列与第二个文件匹配，并将匹配的字符串替换为第二个文件中的第一列文件，并将其替换为第一栏。基本上给这个输出

font-family: Helvetica;
font-size: 10.86px;
font-weight: 700;
text-anchor: middle;
fill: #000000;
stroke: none;">
 establishment_of_protein_localization_to_endoplasmic_reticulum 
</text>

<text x="509.10" y="-243.88"

style="
font-family: Helvetica;
font-size: 10.72px;
font-weight: 700;
text-anchor: middle;
fill: #000000;
stroke: none;">
lipid_oxidation
</text>

GO:######序列与第一个文件中的列匹配。我尝试使用此命令

#!/bin/bash

    awk 'NR==FNR{a[$2]=$1;next}{$1=a[$1\2];}1' input.csv

但是，它取代的不仅仅是文件a的第2列中的字符串

Answer 1

您期待的解决方案如下所示。但是您的输出与输入文件不匹配

awk 'FNR==NR{ hashKey[$2]=$1; next }$1 in hashKey{$1=hashKey[$1]}1' FS='\t' file1 file2

我们的想法是在第一个文件的第二列中散列值，这是以制表符分隔的。然后，在第一列中存在于哈希表中的那些值的第二个值上，我们更新存储的哈希值。

使用Awk用一个来自其他文件的字符串替换一个文件中的字符串

1 个答案: