我有一个很长的字符串要拆分。
str1 = ' BATON ROUGE, LA -- Ascension, Assumption, East Baton Rouge, East Feliciana, Iberville, Livingston, Pointe Coupee, St. Helena, St. Mary, West Baton Rouge, West Feliciana Parishes, LA; Amite and Wilkinson Counties, MS. BEAUMONT-PORT ARTHUR, TX -- Hardin, Jasper, Jefferson, Newton, Orange,Tyler Counties, TX. '
预期产出是:
sub1 = 'BATON ROUGE, LA -- Ascension, Assumption, East Baton Rouge, East Feliciana, Iberville, Livingston, Pointe Coupee, St. Helena, St. Mary, West Baton Rouge, West Feliciana Parishes, LA; Amite and Wilkinson Counties, MS.'
sub2 = 'BEAUMONT-PORT ARTHUR, TX -- Hardin, Jasper, Jefferson, Newton, Orange,Tyler Counties, TX.'
sub1和sub2包含区域名称和州名称以及关联的县名单。
如果我只用“。”分开。 ,有些县名也会包含'。'。 我怎么能分裂模式,每个sub1或sub2应该以状态像差和'。'结束,就像这里'MS'一样。 , 'TX'。? 谢谢你的帮助。
答案 0 :(得分:1)
你可以试试这个:
import re
str1 = ' BATON ROUGE, LA -- Ascension, Assumption, East Baton Rouge, East Feliciana, Iberville, Livingston, Pointe Coupee, St. Helena, St. Mary, West Baton Rouge, West Feliciana Parishes, LA; Amite and Wilkinson Counties, MS. BEAUMONT-PORT ARTHUR, TX -- Hardin, Jasper, Jefferson, Newton, Orange,Tyler Counties, TX. '
new_data = re.split("(?<=\s[A-Z]{2})\.", str1)
print(new_data[0])
print(new_data[1])
输出:
BATON ROUGE, LA -- Ascension, Assumption, East Baton Rouge, East Feliciana, Iberville, Livingston, Pointe Coupee, St. Helena, St. Mary, West Baton Rouge, West Feliciana Parishes, LA; Amite and Wilkinson Counties, MS
BEAUMONT-PORT ARTHUR, TX -- Hardin, Jasper, Jefferson, Newton, Orange,Tyler Counties, TX
正则表达式解释:
\s[A-Z]{2}
:寻找双重大写字母缩写,即由空格继续的州名缩写
(?<=\s[A-Z]{2}\.
:正面观察,确定.
之前是否有上述模式。