匹配两个文件之间的行并标记匹配的字符串

时间:2014-09-21 13:10:13

标签: python r pattern-matching

给定两个文件A和B,有没有办法编辑B中字符串的字体,颜色等,当匹配这两个文件时,它与A中的字符串重叠?不匹配的字符串应保持原样,因此输出文件应保持与输入相同的长度。

示例:

档案A

 NM_134083  mmu-miR-96-5p   NM_134083       0.96213 -0.054
 NM_177305  mmu-miR-96-5p   NM_177305       0.95707 -0.099
 NM_026184  mmu-miR-93-3p   NM_026184       0.9552  -0.01

档案B

 NM_134083
 NM_177305
 NM_17343052324

输出

 **NM_134083**  mmu-miR-96-5p   **NM_134083**       0.96213 -0.054
 **NM_177305**  mmu-miR-96-5p   **NM_177305**       0.95707 -0.099

1 个答案:

答案 0 :(得分:1)

您提供原始文本但不指定要执行的格式化类型。保留格式详细信息,是的,您可以使用格式化内容替换FileA中同样位于FileB中的文本。

import re
with open('fileA.txt') as A:
    A_content=[x.strip() for x in A]
with open('fileB.txt') as B:
    B_content=[x.strip() for x in B]
output=[]
for line_A in A_content:
    for line_B in B_content:
        #do whatever formatting you need on the text, 
        # I am just surrounding it with *'s here

        replace = "**" + line_B + "**"

        #use re.sub, 
        # details here: https://docs.python.org/2/library/re.html#re.sub

        line_A = re.sub(line_B, replace , line_A)
    #I am adding everything to the output array but you can check if it is 
    # different from the initial content. I leave that for you to do
    output.append(line_A)

<强>输出

**NM_134083**  mmu-miR-96-5p   **NM_134083**       0.96213 -0.054
**NM_177305**  mmu-miR-96-5p   **NM_177305**       0.95707 -0.099
NM_026184  mmu-miR-93-3p   NM_026184       0.9552  -0.01