我所追求的是使用python来提取与csv的自由文本字段中出现的正则表达式的匹配。如果找到匹配(或匹配),我希望脚本将这些匹配附加到csv中的新列中。
示例csv数据:
recordID,freetextField
row1,lots of text blah blah blah etc 07635463726 etc etc etc
row2,07938998988 blahblah
row3,07635463726blahblah07635463726
期望的结果:
recordID,freetextField,phonenumber1,phonenumber2
row1,lots of text blah blah blah etc 07635463726 etc etc etc,07635463726,
row2,07938998988 blahblah,07938998988
row3,07635463999blahblah07635463726,07635463999,07635463726
使用正则表达式:
(?(?:(?:0(?:0|11)\)?[\s-]?\(?|\+)44\)?[\s-]?\(?(?:0\)?[\s-]?\(?)?|0)(?:\d{5}\)?[\s-]?\d{4,5}|\d{4}\)?[\s-]?(?:\d{5}|\d{3}[\s-]?\d{3})|\d{3}\)?[\s-]?\d{3}[\s-]?\d{3,4}|\d{2}\)?[\s-]?\d{4}[\s-]?\d{4}|8(?:00[\s-]?11[\s-]?11|45[\s-]?46[\s-]?4\d))(?:(?:[\s-]?(?:x|ext\.?\s?|\#)\d+)?)
答案 0 :(得分:0)
你可以尝试类似的东西:
outrow = []
with open("path/to/inputFile.csv", "rb") as fileIn:#input file location
with open("path/to/outputFile.csv", "wb") as fileOut:#output file location
writer = csv.writer(fileOut)
reader = csv.reader(fileIn, delimiter = ',')
for row in reader:
recordID = row[0]
freetextField = row[1]
phonenumber1 = ''
phonenumber2 = ''
for cell in row:
phonenumber2 = re.findall(r"regex/goes/here",cell)
outrow
writer.writerow(["recordID,freetextField,phonenumber1,phonenumber2"])
writer.writerow([recordID,freetextField,phonenumber1,phonenumber2])