我正在尝试在沙发基地进行一些大量插入。我试图搜索SO和谷歌的例子,但我无法得到任何线索。有人提到这是不可能的。
How to insert a documents in bulk in Couchbase?
但我猜这个问题是在3年前提出来的。我搜索,如果我从下面给出的链接正确理解,它可以批量插入文档。
https://developer.couchbase.com/documentation/server/current/sdk/batching-operations.html
https://pythonhosted.org/couchbase/api/couchbase.html#batch-operation-pipeline
以下是我想在couchbase中实现批量插入的代码
import time
import csv
from couchbase import Couchbase
from couchbase.bucket import Bucket
from couchbase.exceptions import CouchbaseError
c = Bucket('couchbase://localhost/bulk-load')
from couchbase.exceptions import CouchbaseTransientError
BYTES_PER_BATCH = 1024 * 256 # 256K
with open('/home/royshah/Desktop/bulk_try/roy.csv') as csvfile:
lines = csvfile.readlines()[4:]
for k, line in enumerate(lines):
data_tmp = line.strip().split(',')
strDate = data_tmp[0].replace("\"", "")
timerecord = datetime.datetime.strptime(strDate,
'%Y-%m-%d %H:%M:%S.%f')
microsecs = timerecord.microsecond
strDate = "\"" + strDate + "\""
ts = calendar.timegm(timerecord.timetuple())*1000000 + microsecs
datastore = [ts] + data_tmp[1:]
stre = {'col1 ': datastore[1], # I am making key-values on the fly from csv file
'col2': datastore[2],
'col3': datastore[3],
'col4': datastore[4],
'col5': datastore[5],
'col6': datastore[6]}
cb.upsert(str(datastore[0]), (stre)) # datastore[0] is used as document
id and (stre) is used as key-value to be
inserted for respective id.
cb.upsert(str(datastore [0]),(stre)) 正在进行单次插入,我想让它进行批量插入以使其更快。我不知道如何在couchbase中批量插入这个。我找到了这个例子但不确定如何实现。
https://developer.couchbase.com/documentation/server/current/sdk/batching-operations.html
如果有人在couchbase中指出了一些批量加载的例子,或者帮助我弄清楚如何通过我的代码进行批量插入。我真的很感激。 .thanx很多任何想法或帮助。
答案 0 :(得分:1)
我尝试将docs中的示例改编为您的用例。您可能需要更改一两个细节,但您应该明白这一点。
c = Bucket('couchbase://localhost/bulk-load')
from couchbase.exceptions import CouchbaseTransientError
BYTES_PER_BATCH = 1024 * 256 # 256K
batches = []
cur_batch = {}
cur_size = 0
batches.append(cur_batch)
with open('/home/royshah/Desktop/bulk_try/roy.csv') as csvfile:
lines = csvfile.readlines()[4:]
for key, line in enumerate(lines):
#Format your data
data_tmp = line.strip().split(',')
strDate = data_tmp[0].replace("\"", "")
timerecord = datetime.datetime.strptime(strDate,
'%Y-%m-%d %H:%M:%S.%f')
microsecs = timerecord.microsecond
strDate = "\"" + strDate + "\""
timestamp = calendar.timegm(timerecord.timetuple())*1000000 + microsecs
#Build kv
datastore = [ts] + data_tmp[1:]
value = {'col1 ': datastore[1], # I am making key-values on the fly from csv file
'col2': datastore[2],
'col3': datastore[3],
'col4': datastore[4],
'col5': datastore[5],
'col6': datastore[6]}
key = str(datastore[0]
cur_batch[key] = value
cur_size += len(key) + len(value) + 24
if cur_size > BYTES_PER_BATCH:
cur_batch = {}
batches.append(cur_batch)
cur_size = 0
print "Have {} batches".format(len(batches))
num_completed = 0
while batches:
batch = batches[-1]
try:
cb.upsert_multi(batch)
num_completed += len(batch)
batches.pop()
except CouchbaseTransientError as e:
print e
ok, fail = e.split_results()
new_batch = {}
for key in fail:
new_batch[key] = all_data[key]
batches.pop()
batches.append(new_batch)
num_completed += len(ok)
print "Retrying {}/{} items".format(len(new_batch), len(ok))