python字符串拆分模式

时间:2017-10-02 11:29:29

标签: python string

我有一个很长的字符串要拆分。

 str1 = ' BATON ROUGE, LA -- Ascension, Assumption, East Baton Rouge, East Feliciana, Iberville, Livingston, Pointe Coupee, St. Helena, St. Mary, West Baton Rouge, West Feliciana Parishes, LA; Amite and Wilkinson Counties, MS. BEAUMONT-PORT ARTHUR, TX -- Hardin, Jasper, Jefferson, Newton, Orange,Tyler Counties, TX. '

预期产出是:

sub1 = 'BATON ROUGE, LA -- Ascension, Assumption, East Baton Rouge, East Feliciana, Iberville, Livingston, Pointe Coupee, St. Helena, St. Mary, West Baton Rouge, West Feliciana Parishes, LA; Amite and Wilkinson Counties, MS.'
sub2 = 'BEAUMONT-PORT ARTHUR, TX -- Hardin, Jasper, Jefferson, Newton, Orange,Tyler Counties, TX.'

sub1和sub2包含区域名称和州名称以及关联的县名单。

如果我只用“。”分开。 ,有些县名也会包含'。'。 我怎么能分裂模式,每个sub1或sub2应该以状态像差和'。'结束,就像这里'MS'一样。 , 'TX'。? 谢谢你的帮助。

1 个答案:

答案 0 :(得分:1)

你可以试试这个:

import re
str1 = ' BATON ROUGE, LA -- Ascension, Assumption, East Baton Rouge, East Feliciana, Iberville, Livingston, Pointe Coupee, St. Helena, St. Mary, West Baton Rouge, West Feliciana Parishes, LA; Amite and Wilkinson Counties, MS. BEAUMONT-PORT ARTHUR, TX -- Hardin, Jasper, Jefferson, Newton, Orange,Tyler Counties, TX. '
new_data = re.split("(?<=\s[A-Z]{2})\.", str1)
print(new_data[0])
print(new_data[1])

输出:

BATON ROUGE, LA -- Ascension, Assumption, East Baton Rouge, East Feliciana, Iberville, Livingston, Pointe Coupee, St. Helena, St. Mary, West Baton Rouge, West Feliciana Parishes, LA; Amite and Wilkinson Counties, MS

BEAUMONT-PORT ARTHUR, TX -- Hardin, Jasper, Jefferson, Newton, Orange,Tyler Counties, TX

正则表达式解释:

\s[A-Z]{2}:寻找双重大写字母缩写,即由空格继续的州名缩写

(?<=\s[A-Z]{2}\.:正面观察,确定.之前是否有上述模式。