由于竞争条件以及MySQL增加其AUTO_INCREMENT计数器的方式,我的行ID值之间出现了明显的差距。
背景:Python脚本使用多个线程来收集存储在MySQL数据库行中的数据。每隔几个线程就会尝试存储相同的唯一行。似乎每个插入都在递增AUTO_INCREMENT计数器,即使这些插入导致重复的条目异常,并且回滚也不会回滚AUTO_INCREMENT计数器。
问题:有没有办法避免竞争条件,保留连续的行ID,并最小化差距?我意识到“不要担心差距”是一个答案。我可以想到像“检查前锁定表”和“生成另一个线程以跟踪顺序行ID”这样的有问题的解决方案,但我想知道是否有更聪明的解决方案。
我的实际表格相当复杂,但这是一个简化的测试用例:
CREATE TABLE `test` (
`test_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`test_thread` SMALLINT UNSIGNED NOT NULL,
`test_data` varchar(80),
PRIMARY KEY (`test_id`),
UNIQUE KEY `test_data_unique` (test_data)
) ENGINE=InnoDB AUTO_INCREMENT=0;
以下是填充表格的示例代码:
#!/bin/env python2.7
import os,MySQLdb,multiprocessing
MYSQL_HOST = 'localhost'
MYSQL_USER = ''
MYSQL_DB = 'packertest'
MYSQL_PASS = ''
MYSQL_SOCKET = '/var/lib/mysql/mysql.sock'
# numerous worker threads are running simultaneously
def worker():
# connect to the database
db = MySQLdb.connect(host=MYSQL_HOST, user=MYSQL_USER, db=MYSQL_DB, passwd=MYSQL_PASS, unix_socket=MYSQL_SOCKET)
cursor = db.cursor(MySQLdb.cursors.DictCursor)
fd = open('/usr/share/dict/words')
# for each line in the word list...
for line in fd.xreadlines():
# check if this row has been added
query = cursor.execute('SELECT 1 FROM test WHERE `test_data`="%s"' % (line.strip()))
if cursor.rowcount > 0:
continue
# add the row if the value isn't present
try:
query = cursor.execute( 'INSERT INTO test (`test_thread`,`test_data`) VALUES (%d,"%s")' % (os.getpid(),line.strip()) )
db.commit()
except MySQLdb.IntegrityError:
db.rollback()
if __name__=="__main__":
for i in range(24):
newproc = multiprocessing.Process(target=worker,args=())
newproc.start()
这是我得到的典型差距:
mysql> SELECT * FROM test ORDER BY test_id ASC;
+---------+-------------+-----------+
| test_id | test_thread | test_data |
+---------+-------------+-----------+
| 1 | 29073 | 1080 |
| 6 | 29068 | 10-point |
| 10 | 29085 | 10th |
| 13 | 29086 | 11-point |
| 14 | 29078 | 12-point |
| 23 | 29067 | 16-point |
| 24 | 29073 | 18-point |