假设我通过下载和读取CSV文件(按行)在数据库中创建了一些表。有时CSV文件会更新。
csv具有下一个结构(例如):
region;district
r1;d1
r1;d2
r2;d1
...
DB也有两个表region(id, title), district(id, region_id, title)
此刻,我的脚本将执行以下操作:
用于CSV文件中的每个行:
id
。否则:只需返回id
。
(我需要在区号表的region_id
字段中使用此ID)这里有一段代码(错误的代码,但也许可以帮到忙):
import asyncio
import aiohttp
import csv
from aiopg.sa import create_engine
from io import TextIOWrapper
from sqlalchemy import MetaData, Column, Integer, String, Table
from zipfile import ZipFile
async def update_db():
async with create_engine(user=DB_USER,
database=DB_NAME,
host=DB_HOST,
password=DB_PASSWORD) as engine:
async with engine.acquire() as conn:
with ZipFile(ZIP_FILENAME) as zip_file:
with zip_file.open(CSV_FILENAME, 'r') as file:
metadata = MetaData(bind=engine)
_region, _district = get_db_tabels(metadata)
file = csv.reader(TextIOWrapper(file), delimiter=';')
headers = next(file)
for line in file:
region, district = line
result = await conn.execute(_region.select().where(_region.c.title==region))
if result.rowcount != 0:
for row in result:
region_id = row[0]
break;
else:
result = await conn.execute(_region.insert().values(title=region))
for row in result:
region_id = row[0]
break;
如上所述,csv有时会进行更新(并且大小巨大:百万行)。
主要问题:如何在csv中检测换行?我能否以某种方式检查db中是否存在行(当前区域/区域),而无需在csv中查询每个行?
限制: