Question

我正在编写一个python函数来获取一大块文本，使用f.readlines从文本文件中解析并将此块文本拆分为一个列表。该文本包含分隔符，我想在这些位置专门分割此文本。以下是相关文本文件的示例。

@model:2.4.0=Skeleton "Skeleton"
@compartments
 Cell=1.0 "Cell"
@species
 Cell:[A]=100.0 "A"
 Cell:[B]=1.0 "B"
 Cell:[C]=0.0 "C"
 Cell:[D]=0.0 "D"
@parameters
kcat=4000
km = 146
v2_k = 88
@reactions
@r=v1 "v1"
 A -> C : B
 Cell * kcat * B * A / (km + A) 
@r=v2 "v2"
 C -> C+D
 Cell * v2_k * C

我想要的输出是有一个python字典，它将分隔符的名称作为键，并将该分隔符和下一个分隔符之间的所有内容作为值。例如，sections字典的第一个元素应该是：

sections['@model']=:2.4.0=Skeleton "Skeleton"

当前代码

def split_sections(SBshorthand_file):
    '''
    Takes a SBshorthand file and returns a dictionary of each of the sections. 
    Keys of the dictionary are the dividers.
    Values of dictionary are the content between dividers. 
    '''
    SBfile=parse_SBshorthand_read(SBshorthand_file) #simple parsing function. uses f.read()
    dividers=["@model", "@units", "@compartments", "@species", "@parameters", "@rules", "@reactions", "@events"]
    sections={}
    for i in  dividers:
        pattern=re.compile(i)
        if re.findall(pattern,SBfile) == []:
            pass
#            print 'Section \'{}\' not present in {}'.format(i,SBshorthand_file)
        else:
            SBfile2=re.sub(pattern,'\n'+i,SBfile)
            print SBfile2

然而，这不符合我的要求。有人会有任何想法如何修复我的代码？感谢

-----------------编辑--------------------

请注意，“@reactions”部分包含许多“反应”，所有这些都以@r开头，但它们都需要在反应键下分组。

Answer 1

import re

x="""@model:2.4.0=Skeleton "Skeleton"
@compartments
Cell=1.0 "Cell"
@species
Cell:[A]=100.0 "A"
Cell:[B]=1.0 "B"
Cell:[C]=0.0 "C"
Cell:[D]=0.0 "D"
@parameters
kcat=4000
km = 146
v2_k = 88
@reactions
@r=v1 "v1"
A -> C : B
Cell * kcat * B * A / (km + A)
@r=v2 "v2"
C -> C+D
Cell * v2_k * C"""


print dict(re.findall(r"(?:^|(?<=\n))(@\w+)([\s\S]*?)(?=\n@(?!r\b)\w+|$)",x))

您可以直接使用re.findall并获得所需内容。

Answer 2

您可以按如下方式使用捕获组：

re.findall(r"(?s)(@.*?)[\s:]\s+(.*?)(?=[@$])");

demo

where capture group1 matches the key
capture group2 matches the value

使用python

2 个答案: