从csv中找到正则表达式匹配项并附加为新列

时间:2015-12-10 18:01:32

标签: python

我所追求的是使用python来提取与csv的自由文本字段中出现的正则表达式的匹配。如果找到匹配(或匹配),我希望脚本将这些匹配附加到csv中的新列中。

示例csv数据:

recordID,freetextField

row1,lots of text blah blah blah etc 07635463726 etc etc etc

row2,07938998988 blahblah

row3,07635463726blahblah07635463726 

期望的结果:

recordID,freetextField,phonenumber1,phonenumber2

row1,lots of text blah blah blah etc 07635463726 etc etc etc,07635463726,

row2,07938998988 blahblah,07938998988

row3,07635463999blahblah07635463726,07635463999,07635463726

使用正则表达式:

(?(?:(?:0(?:0|11)\)?[\s-]?\(?|\+)44\)?[\s-]?\(?(?:0\)?[\s-]?\(?)?|0)(?:\d{5}\)?[\s-]?\d{4,5}|\d{4}\)?[\s-]?(?:\d{5}|\d{3}[\s-]?\d{3})|\d{3}\)?[\s-]?\d{3}[\s-]?\d{3,4}|\d{2}\)?[\s-]?\d{4}[\s-]?\d{4}|8(?:00[\s-]?11[\s-]?11|45[\s-]?46[\s-]?4\d))(?:(?:[\s-]?(?:x|ext\.?\s?|\#)\d+)?)

1 个答案:

答案 0 :(得分:0)

你可以尝试类似的东西:

outrow = []    
with open("path/to/inputFile.csv", "rb") as fileIn:#input file location
    with open("path/to/outputFile.csv", "wb") as fileOut:#output file location
        writer = csv.writer(fileOut)
        reader = csv.reader(fileIn, delimiter = ',')
        for row in reader:
            recordID = row[0]
            freetextField = row[1]
            phonenumber1 = ''
            phonenumber2 = ''
            for cell in row:
                phonenumber2 = re.findall(r"regex/goes/here",cell)
                outrow
        writer.writerow(["recordID,freetextField,phonenumber1,phonenumber2"])   
        writer.writerow([recordID,freetextField,phonenumber1,phonenumber2])