我有一个制表符分隔文件,格式为:
sentenceID (sid) documentID (scid) sentenceText (sent)
E.g。
100004 100 即便您喜爱流连酒吧,也定然在这轻松安闲的一隅,来一场甜蜜沉醉的约会。
100005 100 您可以慢慢探究菜单上所有的秘密惊喜。
我想用以下架构将它放入sqlite3:
CREATE TABLE sent (
sid INTEGER PRIMARY KEY,
scid INTEGER,
sent TEXT,
);
是否有一种快速方法可以将sthonite(http://docs.python.org/2/library/sqlite3.html)的python API用于表中?
我一直在这样做:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sqlite3 as lite
import sys, codecs
con = lite.connect('mycorpus.db')
with con:
cur = con.cursor()
cur.execute("CREATE TABLE Corpus(sid INT, scid INT, sent TEXT, PRIMARY KEY (sid))")
for line in codecs.read('corpus.tab','r','utf8'):
sid,scid,sent = line.strip().split("\t")
cur.execute("INSERT INTO Corpus VALUES("+sid+","+scid+"'"+sent+"')")
答案 0 :(得分:4)
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sqlite3 as lite
con = lite.connect('myCorpus.db')
cur = con.cursor()
cur.execute("CREATE TABLE Corpus(sid INT, scid INT, sent TEXT, PRIMARY KEY (sid))")
data=[row.split('\t') for row in file('myfile.tab','r').readlines()]
cur.executemany("INSERT INTO Corpus (sid, scid,sent) VALUES (?, ?, ?);", data)
con.commit()
答案 1 :(得分:3)
以下是使用unicodecsv模块的示例:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sqlite3
import unicodecsv
con = sqlite3.connect('mycorpus.db')
cur = con.cursor()
cur.execute("CREATE TABLE Corpus(sid INT, scid INT, sent TEXT, PRIMARY KEY (sid))")
with open('corpus.tab', 'rb') as input_file:
reader = unicodecsv.reader(input_file, delimiter="\t")
data = [row for row in reader]
cur.executemany("INSERT INTO Corpus (sid, scid, sent) VALUES (?, ?, ?);", data)
con.commit()
另见:
希望有所帮助。