我部署应用程序以使用一些.csv数据。我想将它们复制到MySQL表中。在stackoverflow用户的帮助下,我编写了以下代码:
import csv
import MySQLdb
db = MySQLdb.connect( host = "dbname.description.host.com",
user = "user",
passwd = "key",
db = "dbname")
cursor = db.cursor()
query = 'INSERT INTO table_name(column,column_1,column_2,column_3)
VALUES(%s, %s, %s, %s)'
csv_data = csv.reader(file('file_name'))
for row in csv_data:
cursor.execute(query,row)
db.commit()
cursor.close()
问题是,目前这个过程太慢了,我需要加快速度。
THX
答案 0 :(得分:1)
您可以使用RedirectMatch 301 ^/$ /de/
RedirectMatch 301 ^/site_1/$ https://www.new.com/de/company/site_1/
RedirectMatch 301 ^/services/site_2/$ https://www.new.com/de/services/site_1/
批处理作业,如下所示
executemany
答案 1 :(得分:0)
承诺提交:
for row in csv_data:
cursor.execute(query,row)
db.commit()
它会减少工作量并且会更快
答案 2 :(得分:0)
您使用的代码由于多种原因而非常低效,因为您一次将每个数据提交一行(这可能是您想要的事务数据库或进程),而不是一次性转储。
有很多方法可以加快速度,从优秀到不优秀。这里有4种方法,包括幼稚实施(上图)
.test
odo方法最快(在引擎盖下使用mysql LOAD DATA INFILE) 接下来是Pandas(关键代码路径已经过优化) 接下来是使用原始游标,但批量插入行 最后是天真的方法,一次提交一行
以下是针对本地MySQL服务器本地运行的一些示例。
using_odo(./ test.py:29): 0.516秒
using_pandas(./ test.py:23): 3.039秒
using_cursor_correct(./ test.py:50): 12.847秒
using_cursor(./ test.py:34): 43.470秒
计数表1 - 100000
计数表2 - 100000
计数表3 - 100000
计数表4 - 100000
正如你所看到的,天真的实现比odo慢约100倍。 比使用熊猫慢10倍
答案 3 :(得分:0)
解决方案是使用来自MySQL的batch insert。
因此,您需要获取所有要插入的值,并将它们转换为单个字符串,用作execute()方法的参数。
最后,SQL应该看起来像:
INSERT INTO table_name (`column`, `column_1`, `column_2`, `column_3`) VALUES('1','2','3','4'),('4','5','6','7'),('7','8','9','10');
这里是一个例子:
#function to transform your list into a string
def stringify(v):
return "('%s', '%s', %s, %s)" % (v[0], v[1], v[2], v[3])
#transform all to string
v = map(stringify, row)
#glue them together
batchData = ", ".join(e for e in v)
#complete the SQL
sql = "INSERT INTO `table_name`(`column`, `column_1`, `column_2`, `column_3`) \
VALUES %s" % batchData
#execute it
cursor.execute(sql)
db.commit()
答案 4 :(得分:0)
这里有一些统计数据可以支持@Mung Tung的回答。 #!/usr/bin/sh
# An example hook script to verify that each commit that is about to be pushed
# pass the `./run_tests` suite. Called by "git push" after it has checked the
# remote status, but before anything has been pushed.
# If the test suite (and so the script) exits with a non-zero status, nothing
# will be pushed.
#
# In any case, we revert to the pre `$ git push` state.
# Retrieve arguments
remote="$1"
url="$2"
z40=0000000000000000000000000000000000000000 # SHA of a non existing commit
# Save current "git state"
current_branch=$(git rev-parse --abbrev-ref HEAD)
STASH_NAME="pre-push-$(date +%s)"
git stash save -q --keep-index $STASH_NAME
# Do wonders
while read local_ref local_sha remote_ref remote_sha
do
if [ "$local_sha" = $z40 ]
then
# Handle delete
continue # to the next branch
elif [ "$remote_sha" = $z40 ]
then
# New branch, examine all commits
range="$local_sha"
else
# Update to existing branch, examine new commits
range="$remote_sha..$local_sha"
fi
# Retrieve list of commit in "chronological" order
commits=$(git rev-list --reverse $range)
# Loop over each commit
for commit in $commits
do
git checkout $commit
# Run the tests
./test/run_tests.sh
# Retrieve exit code
is_test_passed=$?
# Stop iterating if error
if [ $is_test_passed -ne 0 ]
then
echo -e "Aborting push: Test failed for commit $commit,"\
"with following error trace:\n"
# something like: tail test/run_tests.log
break 2
fi
done
fi
done
# Revert to pre-push state
git checkout $current_branch
STASH_NUM=$(git stash list | grep $STASH_NAME | sed -re 's/stash@\{(.*)\}.*/\1/')
if [ -n "$STASH_NUM" ]
then
git stash pop -q stash@{$STASH_NUM}
fi
# Return exit code
exit $is_test_passed
会执行executemany
。 execute
很难在1秒内达到315次插入,而execute
则达到25,000次插入。
基本计算机配置-
executemany
结果:
2.7 GHz Dual-Core Intel Core i5
16 GB 1867 MHz DDR3
Flash Storage
答案 5 :(得分:0)
我通过使用元组数组解决了这个问题,并将其放入execute语句。处理1英里时。行只用了8分钟。尝试尽可能避免迭代con.execute命令
def process_csv_file4(csv_file, conn):
df = pd.read_csv(csv_file,sep=';',
names=['column'])
query = """
INSERT INTO table
(column)
VALUES
(%s)
ON DUPLICATE KEY UPDATE
column= VALUES(column);
"""
conn.execute(query, tuple(df.values))
答案 6 :(得分:0)
我正在使用 SQL 炼金术库通过 python 脚本加速从 CSV 文件到 MySql 数据库的批量插入。数据库中的数据将以文本格式插入,因此连接到数据库工作台并更改数据类型,数据就可以使用了。
第 1 步: 在命令终端中使用“pip install sqlalchemy”和“pip install mysqlclient”。
import MySQLdb
import sqlalchemy
from sqlalchemy import create_engine
Step 2:
Then create a connection string of create engine through SQL alchemy.
######Create Engine####
syntax- enginecreate_engine("mysql+mysqldb://username:password@hostadress:3306/username")
egzample-
enginecreate_engine("mysql+mysqldb://abc9:abc$123456@127.10.23.1:2207/abc9")
conn=engine.connect()
print(engine);
###########Define your python code##############
def function_name():
data = pd.read_csv(filepath/file.csv')
data_frame = data.to_sql('database_name', engine, method='multi',index=False,
if_exists='replace')
############Close Connection###############
conn = engine.raw_connection()
conn.commit()
conn.close()
运行代码,4分钟内可以插入200万行!!
将此参考链接用于不同的数据库驱动程序:
https://overiq.com/sqlalchemy-101/installing-sqlalchemy-and-connecting-to-database/