Python - 格式化输出

时间:2017-04-04 11:15:55

标签: python regex format output

对于以下二进制文件(可以从here下载):

import re

terms = {}
numbers = {}

meshFile = 'd2017.bin'
with open(meshFile, mode='rb') as file:
    mesh = file.readlines()

outputFile = open('mesh.txt', 'w')

for line in mesh:
    meshTerm = re.search(b'MH = (.+)$', line)
    if meshTerm:
        term = meshTerm.group(1)
    meshNumber = re.search(b'MN = (.+)$', line)
    if meshNumber:
        number = meshNumber.group(1)
        numbers[str(number)] = term
        if term in terms:
            terms[term] = terms[term] + ' ' + str(number)
        else:
            terms[term] = str(number)

cumlist = []
keylist = terms.keys()
for key in keylist:
    #print('THE ORIGIN FOR ', key, file=outputFile)

    item_list = terms[key].split(" ")
    for phrase in item_list:
        cumlist.append(phrase)

print(cumlist)

for item in cumlist:
    print(numbers[str(item)], '\n', item, file=outputFile)

我有以下Python代码:

b'Calcimycin\r' 
 b'D03.633.100.221.173\r'
b'Temefos\r' 
 b'D02.705.400.625.800\r'
b'Temefos\r' 
 b'D02.705.539.345.800\r'
b'Temefos\r' 
 b'D02.886.300.692.800\r'

输出如下:

Calcimycin 
D03.633.100.221.173
Temefos 
D02.705.400.625.800
D02.705.539.345.800
D02.886.300.692.800

如何重新格式化输出,如下所示:

<input type="text" name="q" id="q" autofocus>
<input type="button" id="s" value="Search">

感谢。

1 个答案:

答案 0 :(得分:0)

UPDATE: I simplified the source a bit

你可以试试这个正则表达式:

MH\s*=\s*(\w+)\s*|MN\s*= \s*([^\s]*)

Demo

示例代码:(Run it here

   import re

regex = r"MH\s*=\s*(\w+)\s*|MN\s*= \s*([^\s]*)"

test_str = ("*NEWRECORD\n"
    "RECTYPE = D\n"
    "MH = Calcimycin\n"
    "AQ = AA AD AE AG AI AN BI BL CF CH CL CS CT EC HI IM IP ME PD PK PO RE SD ST TO TU UR\n"
    "ENTRY = A-23187|T109|T195|LAB|NRW|NLM (1991)|900308|abbcdef\n"
    "ENTRY = A23187|T109|T195|LAB|NRW|UNK (19XX)|741111|abbcdef\n"
    "ENTRY = Antibiotic A23187|T109|T195|NON|NRW|NLM (1991)|900308|abbcdef\n"
    "ENTRY = A 23187\n"
    "ENTRY = A23187, Antibiotic\n"
    "MN = D03.633.100.221.173\n"
    "PA = Anti-Bacterial Agents\n"
    "PA = Calcium Ionophores\n"
    "MH_TH = FDA SRS (2014)\n"
    "MH_TH = NLM (1975)\n"
    "ST = T109\n"
    "ST = T195\n"
    "N1 = 4-Benzoxazolecarboxylic acid, 5-(methylamino)-2-((3,9,11-trimethyl-8-(1-methyl-2-oxo-2-(1H-pyrrol-2-yl)ethyl)-1,7-dioxaspiro(5.5)undec-2-yl)methyl)-, (6S-(6alpha(2S*,3S*),8beta(R*),9beta,11alpha))-\n"
    "RN = 37H9VM9WZL\n"
    "RR = 52665-69-7 (Calcimycin)\n"
    "PI = Antibiotics (1973-1974)\n"
    "PI = Carboxylic Acids (1973-1974)\n"
    "MS = An ionophorous, polyether antibiotic from Streptomyces chartreusensis. It binds and transports CALCIUM and other divalent cations across membranes and uncouples oxidative phosphorylation while inhibiting ATPase of rat liver mitochondria. The substance is used mostly as a biochemical tool to study the role of divalent cations in various biological systems.\n"
    "OL = use CALCIMYCIN to search A 23187 1975-90\n"
    "PM = 91; was A 23187 1975-90 (see under ANTIBIOTICS 1975-83)\n"
    "HN = 91(75); was A 23187 1975-90 (see under ANTIBIOTICS 1975-83)\n"
    "MR = 20160527\n"
    "DA = 19741119\n"
    "DC = 1\n"
    "DX = 19840101\n"
    "UI = D000001\n\n"
    "*NEWRECORD\n"
    "RECTYPE = D\n"
    "MH = Temefos\n"
    "AQ = AA AD AE AG AI AN BL CF CH CL CS CT EC HI IM IP ME PD PK RE SD ST TO TU UR\n"
    "ENTRY = Abate|T109|T131|TRD|NRW|NLM (1996)|941114|abbcdef\n"
    "ENTRY = Difos|T109|T131|TRD|NRW|UNK (19XX)|861007|abbcdef\n"
    "ENTRY = Temephos|T109|T131|TRD|EQV|NLM (1996)|941201|abbcdef\n"
    "MN = D02.705.400.625.800\n"
    "MN = D02.705.539.345.800\n"
    "MN = D02.886.300.692.800\n"
    "PA = Insecticides\n"
    "MH_TH = FDA SRS (2014)\n"
    "MH_TH = INN (19XX)\n"
    "MH_TH = USAN (1974)\n"
    "ST = T109\n"
    "ST = T131\n"
    "N1 = Phosphorothioic acid, O,O'-(thiodi-4,1-phenylene) O,O,O',O'-tetramethyl ester\n"
    "RN = ONP3ME32DL\n"
    "RR = 3383-96-8 (Temefos)\n"
    "AN = for use to kill or control insects, use no qualifiers on the insecticide or the insect; appropriate qualifiers may be used when other aspects of the insecticide are discussed such as the effect on a physiologic process or behavioral aspect of the insect; for poisoning, coordinate with ORGANOPHOSPHATE POISONING\n"
    "PI = Insecticides (1966-1971)\n"
    "MS = An organothiophosphate insecticide.\n"
    "PM = 96; was ABATE 1972-95 (see under INSECTICIDES, ORGANOTHIOPHOSPHATE 1972-90)\n"
    "HN = 96; was ABATE 1972-95 (see under INSECTICIDES, ORGANOTHIOPHOSPHATE 1972-90)\n"
    "MR = 20130708\n"
    "DA = 19990101\n"
    "DC = 1\n"
    "DX = 19910101\n"
    "UI = D000002\n\n\n\n\n\n\n"
    "Calcimycin \n"
    "D03.633.100.221.173\n"
    "Temefos \n"
    "D02.705.400.625.800\n"
    "D02.705.539.345.800\n"
    "D02.886.300.692.800")

matches = re.finditer(regex, test_str, re.MULTILINE | re.IGNORECASE)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        if(match.group(groupNum) is not None):
          print(match.group(groupNum))

示例输出:

Calcimycin
D03.633.100.221.173
Temefos
D02.705.400.625.800
D02.705.539.345.800
D02.886.300.692.800