在来自csv的重复值上加载数据文件更新表

时间:2018-11-29 10:15:01

标签: python mysql csv

我正在尝试根据我的csv数据更新mysql表,其中csv中的sha1应该更新或在重复项上插入建议名称。我在哪里错了?给我错误:

ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'where sha1=@col1' at line 1

这是我的表结构:

date_sourced, sha1, suggested, vsdt, trendx, falcon, notes, mtf

CSV结构:

SHA1,suggestedName

代码:

import mysql.connector
mydb = mysql.connector.connect(user='root', password='',
host='localhost',database='jeremy_db')

cursor = mydb.cursor()
query = "LOAD DATA INFILE %s IGNORE INTO TABLE jeremy_table_test FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (@col1,@col2) set suggested=@col2 where sha1=@col1"
cursor.execute(query, (fullPath))
mydb.commit()

1 个答案:

答案 0 :(得分:0)

LOAD DATA INFILE无法在其中添加条件。您可以尝试通过熊猫读取文件,然后将值插入表中,但是您需要预先在sha1上设置唯一索引。否则,我的脚本将无法正常工作(reason)。

import pandas as pd
import mysql.connector as mysql

path = "1.xls"

df = pd.read_excel(path)

_sha1 = df["SHA1"].tolist()
_suggestedName = df["suggestedName"].tolist()

conn = mysql.connect(user="xx",passwd="xx",db="xx")
cur = conn.cursor()

sql = """INSERT INTO jeremy_table_test (sha1,suggested) VALUES (%s,%s) ON DUPLICATE KEY UPDATE suggested=VALUES(suggested)"""

try:
    cur.executemany(sql,list(zip(_sha1,_suggestedName)))
    conn.commit()
except Exception as e:
    conn.rollback()
    raise e