使用Python进行数据规范化

时间:2015-02-02 18:16:13

标签: python csv database-normalization

这是最终将加载到MySQL数据库的csv文件示例。问题是数据未规范化,因为routes列中有多个值。

stop_id,on_street,cross_street,routes,boardings
49,HARRISON,PAULINA,"126, 755",1.6
50,ASHLAND,CONGRESS,"9,126",14.8
51,ASHLAND,VAN BUREN,"9,126",100.9
52,JACKSON,1900 W.(MALCOLM X COLL.),126,82.8

我想将routes列提取到一个新的csv文件中,其中stop_idroute作为列标题,每行只有一条路径。我已经尝试将未规范化的csv导入MySQL数据库,但无法实际规范化。在导入数据库之前,在Python中执行此任何帮助将非常感激。

2 个答案:

答案 0 :(得分:1)

这将为每条路线创建一行。如果你想要一行中的所有路线,你可以摆弄内部for循环。

import csv
import re

sample = """stop_id,on_street,cross_street,routes,boardings
49,HARRISON,PAULINA,"126, 755",1.6
50,ASHLAND,CONGRESS,"9,126",14.8
51,ASHLAND,VAN BUREN,"9,126",100.9
52,JACKSON,1900 W.(MALCOLM X COLL.),126,82.8"""

open('sample.csv','w').write(sample)

with open('sample.csv') as sample, open('output.csv','w') as output:
    reader = csv.reader(sample)
    writer = csv.writer(output)
    # discard input header
    next(reader)
    # write output header
    writer.writerow(['stop_id', 'route'])
    # process rows
    for row in reader:
        if row:
            for route in re.split(r', *', row[3].replace('"', '')):
                writer.writerow([row[0], route])


print open('output.csv').read()

答案 1 :(得分:0)

获取重要的列

def get_interesting_columns():
    import csv
    with open("stuff","r") as f:
        oReader = csv.reader(f)
        next(oReader) # get rid of titles line. 
                      # comment it out if you want the headings too
        for l in oReader:
            yield l[0],l[-2]

您可以使用该生成器创建另一个csv文件。

或者您可以使用类似SQLAlchemy的东西来使用python来执行您需要的sql insert语句。