正则表达式拆分字符串以隔离括在方括号

时间:2018-02-21 00:58:06

标签: python regex

以下是我尝试解析的文本中的一个示例子字符串以及我试图将此文本拆分为的几个原始字符串:

>>> test_string = "[shelter and transitional housing during shelter crisis - selection of sites;\nwaiver of certain requirements regarding contracting]\n\nsponsors: acting mayor breed; kim, ronen, sheehy and cohen\nordinance authorizing public works, the department of homelessness and supportive\nhousing, and the department of public health to enter into contracts without adhering to the\nadministrative code or environment code provisions regarding competitive bidding and other\nrequirements for construction work, procurement, and personal services relating to identified\nshelter crisis sites (1601 quesada avenue; 149-6th street; 125 bayshore boulevard; 13th\nstreet and south van ness avenue, southwest corner; 5th street and bryant street, northwest\ncorner; caltrans emergency shelter properties; and existing city navigation centers and\nshelters) that will provide emergency shelter or transitional housing to persons experiencing\nhomelessness; authorizing the director of property to enter into and amend leases or licenses\nfor the shelter crisis sites without adherence to certain provisions of the administrative code;\nauthorizing the director of public works to add sites to the list of shelter crisis sites subject to\nexpedited processing, procurement, and leasing upon written notice to the board of\nsupervisors, and compliance with conditions relating to environmental review and\nneighborhood notice; affirming the planning department’s determination under the californinenvironmental quality act; and making findings of consistency with the general plan, and the eight priority policies of planning code, section 101.1.  assigned under 30 day rule to\nrules committee.\n[memorandum of understanding - service employees international union, local\n1021]\n\nsponsor: acting mayor breed"
>>> title = re.compile(r"\[([\s\S]*)\]")
>>> title = re.compile(r"\[.*\]")

我想要的是获得括在方括号中的所有字符串的列表:[]

>>> title.split(test_string)
['shelter and transitional housing during shelter crisis - selection of sites; waiver of certain requirements regarding contracting', 'memorandum of understanding - service employees international union, local 1021']

但是,这些原始字符串都没有正确分割。在我看来,re包括结束标准]作为非空格字符集的一部分,它应该是字符串被拆分的字符。

我尝试修改原始字符串以拆分为:

title = re.compile(r"\[([\s\S^\]]*)\]")

但这也不起作用。我将这最后一个字符串解释为拆分其中包含[的子字符串,后跟任意数量的字符except for ],然后是]

我怎么误解这个?

1 个答案:

答案 0 :(得分:2)

[\s\S^\]]表示:空格或非空格或插入符^或斜杠或]。你不能混合否定的类和常规的类。我认为使用课程"除了关闭]":[^]]之外,请参阅下面的示例。

您也可以使用 - findall代替split

re.findall(r'\[([^]]*)\]', test_string)[0]