有人可以帮我在每行的开头/结尾删除这些双引号吗?
我有一个大的csv(800k行),并且想要创建插入语句以将数据导入SQL DB。我知道代码真的很丑,但我以前从未使用过Python ...非常感谢任何帮助...
#Script file to read from .csv containing raw location data (zip code database)
#SQL insert statements are written to another CSV
#Duplicate zip codes are removed
import csv
Blockquote
csvfile = open('c:\Canada\canada_zip.csv', 'rb')
dialect = csv.Sniffer().sniff(csvfile.readline())
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
reader.next()
ofile = open('c:\Canada\canada_inserts.csv', 'wb')
writer = csv.writer(ofile, dialect)
#DROP / CREATE TABLE
createTableCmd = '''DROP TABLE PopulatedPlacesCanada \n\
CREATE TABLE PopulatedPlacesCanada \n\
( \n\
ID INT primary key identity not null, \n\
Zip VARCHAR(10), \n\
City nVARCHAR(100), \n\
County nvarchar(100), \n\
StateCode varchar(3), \n\
StateName nvarchar(100), \n\
Country nvarchar(30), \n\
Latitude float, \n\
Longitude float, \n\
PopulationCount int, \n\
Timezone int, \n\
Dst bit \n\
)'''
writer.writerow([createTableCmd])
table = 'PopulatedPlacesCanada'
db_fields = 'Zip, City, County, StateCode, StateName, Country, Latitude, Longitude, PopulationCount, Timezone, Dst'
zip_codes = set()
count = 0
for row in reader:
if row[0] not in zip_codes: #only add row if zip code is unique
count = count + 1
zipCode = row[0] #not every row in the csv is needed so handpick them using row[n]
city = row[1].replace("\'", "").strip()
county = ""
state_abr = row[2]
state = row[3].replace("\'", "").strip()
country = 'Canada'
lat = row[8]
lon = row[9]
pop = row[11]
timezone = row[6]
dst = row[7]
if dst == 'Y':
dst= '1'
if dst == 'N':
dst = '0'
query = "INSERT INTO {0}({1}) VALUES ('{2}', '{3}', '{4}', '{5}', '{6}', '{7}', {8}, {9}, {10}, {11}, {12})".format(table, db_fields, zipCode, city, county, state_abr, state, country, lat, lon, pop, timezone, dst)
writer.writerow([query])
zip_codes.add(row[0])
if count == 100: #Go statement to make sql batch size manageable
writer.writerow(['GO'])
答案 0 :(得分:0)
首先指出2个: - 1)在三个撇号上使用三重倒置逗号表示多行字符串 2)无需在多行字符串中加上“\ n \”。
要删除行中的引号,请使用python的正则表达式模块而不是字符串替换。
import re
quotes = re.compile('^["\']|["\']$')
city = quotes.sub( row[3] )
state = quotes.sub( row[4] )
或者你使用带有你要从两端移除的角色的条带; AFAIK一次只有一个字符: -
city = row[3].strip('"').strip("'")
state = row[4].strip('"').strip("'")
最后,不要将csv模块用于文件输出,因为它需要'context'。只需打开文件,然后写入即可。
ofile = file( 'canada_inserts.sql','w' )
ofile.write( createTableCmd + '\n' )
for row in reader:
...
ofile.write( query + '\n' )
答案 1 :(得分:0)
您没有编写 CSV 文件。不要使用csv编写器,因为它可能会为您的数据添加额外的ascaning。相反,使用
ofile = file( 'load.sql', 'w')
# Raw write, no newline added:
ofile.write(...)
# or, with newline at the end:
print >>ofile, "foobar."
CSV编辑器正在为您的行添加引号:大多数CSV方言都希望字符串在包含某些字符时用引号括起来,例如,
或;
或甚至空格。但是,当您编写SQL而不是CSV时,您不需要或不需要它。