正则表达式匹配python

时间:2018-02-17 10:20:01

标签: python regex

我有一个countries.txt文件,其中包含以下示例文本:

[Country "Kenya"]\n[CapitalCity "Nairobi"]\n\n
[Country "Uganda"]\n[CapitalCity "Kampala"]\n\n
[Country "Tanzania"]\n[CapitalCity "Dodoma"]\n\n

该国家/地区最多可以包含20个属性。为简单起见,我只包括Country和CapitalCity。 我需要一个在python中工作的正则表达式,它将返回上面的示例数据:

a) n matches, in the above case n=3
b) Each match should have m groups, in this case m=2: Country and CapitalCity

我已阅读此https://www.regular-expressions.info/captureall.html但似乎无法让它适用于我的用例。

我试过这个

(\[([A-Za-z]+)\s\"([^\"]*)\"\]\\n\\n)+

这里https://regex101.com/r/cujIDd/1,但它没有给我国家。

修改: 预期的输入和输出

示例1: 输入

[Country "Kenya"]\n[CapitalCity "Nairobi"]\n\n
[Country "Uganda"]\n[CapitalCity "Kampala"]\n\n
[Country "Tanzania"]\n[CapitalCity "Dodoma"]\n\n

预期产出

matches: 3
match 1: Country: Kenya
         CapitalCity: Nairobi
match 2: Country: Uganda
         CapitalCity: Kampala
match 3: Country: Tanzania
         CapitalCity: Dodoma

示例2: 输入

[Country "Kenya"]\n[CapitalCity "Nairobi"]\n[President "Kenyatta"]\n\n
[Country "Uganda"]\n[CapitalCity "Kampala"]\n[President "Museveni"]\n\n
[Country "Tanzania"]\n[CapitalCity "Dodoma"]\n[President "Magufuli"]\n\n

预期产出

matches: 3
match 1: Country: Kenya
         CapitalCity: Nairobi
         President: Kenyatta
match 2: Country: Uganda
         CapitalCity: Kampala
         President: Museveni
match 3: Country: Tanzania
         CapitalCity: Dodoma
         President: Magufuli

示例3: 输入

[Country "Kenya"]\n[CapitalCity "Nairobi"]\n[President "Kenyatta"]\n[Continent "Africa"]\n\n
[Country "Uganda"]\n[CapitalCity "Kampala"]\n[President "Museveni"]\n[Continent "Africa"]\n\n
[Country "Tanzania"]\n[CapitalCity "Dodoma"]\n[President "Magufuli"]\n[Continent "Africa"]\n\n

预期产出

matches: 3
match 1: Country: Kenya
         CapitalCity: Nairobi
         President: Kenyatta
         Continent: Africa
match 2: Country: Uganda
         CapitalCity: Kampala
         President: Museveni
         Continent: Africa
match 3: Country: Tanzania
         CapitalCity: Dodoma
         President: Magufuli
         Continent: Africa

你得到了流程

2 个答案:

答案 0 :(得分:1)

您可能会使用类似于以下内容的内容:

regex = r"^[^\"]*\"(\w+)\"[^\"]+\"(\w+)\"[^\"].*"
subst = "\\1, \\2"

result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

<强>输出

Kenya, Nairobi
Uganda, Kampala
Tanzania, Dodoma

示例

https://regex101.com/r/cujIDd/6

答案 1 :(得分:1)

您可以移除外部重复组()+并使第二个\\n可选(?:\\n)?

请参阅regex101.com上的regex in use

\[([A-Za-z]+)\s\"([^\"]*)\"\]\\n(?:\\n)?

如果您只想捕获前2个属性,可以使用^$个锚点:

^\[([A-Za-z]+)\s*\"([^\"]+)\"\]\\n\[([A-Za-z]+)\s*\"([^\"]+)\"\].*$

请参阅regex101.com上的regex in use