Python Regex用于变量字符串

时间:2012-01-16 11:51:11

标签: python regex

我有一些像这样的样本数据:

MADISON COUNTY,,,,,,,,,,,,, "London, City of",,,,,,,,,,,,597,519
2.1,mill /s,(replacement),for,5 years,",",commencing in,2007,",",first due in calendar year,2008,",",, for current operating expenses
-,,,,,,,,,,,,, London Public Library District,,,,,,,,,,,,716,869 1.2,mill /s,(replacement),"& increase of 1.7 mills, for 15 years, commencing in 2007, first due in",,,,,,,,,, "calendar year 2008, for
current expenses -",,,,,,,,,,,,, "Range, Township of",,,,,,,,,,,,62,13
1.7,mill /s,(renewal),for,5 years,",",commencing in,2007,",",first due in calendar year,2008,",",, for fire protection -,,,,,,,,,,,,,

我最后需要的是所有“城镇”的列表,因此输出应为:

["London, City of", "London Public Library District", "Range, Township of"]

我在这里有点挣扎,因为我真的不知道如何将它缩小到这些领域。正如您所看到的那样,逗号系列是一个非常好的开始,但也有不需要的字符串逗号不遵循该模式。最初我以为我会匹配字符串两边的5个逗号,长度为< 100个字符,但是这里的任意逗号令人沮丧:

first due in",,,,,,,,,, "cale

任何线索?

此外,数据通常采用以下格式:

SOME COUNTY,,,,,,,,,,,,, SOME TOWN,,,,,,,,,,,,some long string possibly with commas
,,,,,,,,,,,,, SOME TOWN,,,,,,,,,,,,some long string possibly with commas ... etc

1 个答案:

答案 0 :(得分:0)

我很难从你的样本数据中得知,因为我认为它有额外的换行符,但从你对数据格式的总结来看,Town似乎是每行中的第14列。

由于数据是CSV格式,您不需要使用正则表达式,而是可以使用the csv module来解析数据。提取城镇名称应该像以下一样简单:

import csv

with open('data.csv') as f:
    for row in csv.reader(f):
        print row[13]