我编写了一个代码,用于在'import_data.csv'文件中包含的每个邮政编码上实现给定的正则表达式。然后它生成一个新的csv文件'failed_validation.csv',其中包含验证失败的所有邮政编码。两个文件的结构采用以下格式:
row_id postcode
134534 AABC 123
243534 AACD 4PQ
534345 QpCD 3DR
... ...
以下是我的代码:
import csv
import re
regex = r"(GIR\s0AA)|((([A-PR-UWYZ][0-9][0-9]?)|(([A-PR-UWYZ][A-HK-Y][0-9]((BR|FY|HA|HD|HG|HR|HS|HX|JE|LD|SM|SR|WC|WN|ZE)[0-9])[0-9])|([A-PR-UWYZ][A-HK-Y](AB|LL|SO)[0-9])|(WC[0-9][A-Z])|(([A-PR-UWYZ][0-9][A-HJKPSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRVWXY]))))\s[0-9][ABD-HJLNP-UW-Z]{2})"
codes = []
with open('../import_data.csv','r') as f:
r = csv.reader(f, delimiter=',')
for row in r:
if not(re.findall(regex, row[1])):
codes.append([row[0],row[1]])
with open('failed_validation.csv','w',newline='') as fp:
a = csv.writer(fp)
a.writerows(codes)
代码工作正常但我真正想要的是新文件中的邮政编码需要按照row_id按升序数字顺序排序。我知道如何用Python生成一个新文件,但我不知道如何按升序数字顺序对该文件中的数据进行排序。
答案 0 :(得分:1)
这样做会保留标题行:
import csv
import re
regex = r"(GIR\s0AA)|((([A-PR-UWYZ][0-9][0-9]?)|(([A-PR-UWYZ][A-HK-Y][0-9]((BR|FY|HA|HD|HG|HR|HS|HX|JE|LD|SM|SR|WC|WN|ZE)[0-9])[0-9])|([A-PR-UWYZ][A-HK-Y](AB|LL|SO)[0-9])|(WC[0-9][A-Z])|(([A-PR-UWYZ][0-9][A-HJKPSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRVWXY]))))\s[0-9][ABD-HJLNP-UW-Z]{2})"
codes = []
with open('import_data.csv', 'r', newline='') as fp:
reader = csv.reader(fp, delimiter=',')
header = next(reader)
for row in reader:
if not re.findall(regex, row[1]):
codes.append([row[0],row[1]])
with open('failed_validation.csv', 'w', newline='') as fp:
writer = csv.writer(fp)
writer.writerow(header)
writer.writerows(sorted(codes))
答案 1 :(得分:0)
在写入文件之前对代码列表进行排序。
headers = codes[0]
codes = sorted(codes[1:])
with open('failed_validation.csv','w',newline='') as fp:
a = csv.writer(fp)
a.writerow(header)
a.writerows(codes)