我正在尝试编写一个简单的脚本,它将csv作为输入并将其写入单个电子表格文档中。现在我有它工作,但脚本很慢。在两个工作表中编写cca 350行大约需要10分钟。
这是我的脚本:
#!/usr/bin/python
import json, sys
import gspread
from oauth2client.client import SignedJwtAssertionCredentials
json_key = json.load(open('client_secrets.json'))
scope = ['https://spreadsheets.google.com/feeds']
# change to True to see Debug messages
DEBUG = False
def updateSheet(csv,sheet):
linelen = 0
counter1 = 1 # starting column in spreadsheet: A
counter2 = 1 # starting row in spreadsheet: 1
counter3 = 0 # helper for iterating through line entries
credentials = SignedJwtAssertionCredentials(json_key['client_email'], json_key['private_key'], scope)
gc = gspread.authorize(credentials)
wks = gc.open("Test Spreadsheet")
worksheet = wks.get_worksheet(sheet)
if worksheet is None:
if sheet == 0:
worksheet = wks.add_worksheet("First Sheet",1,8)
elif sheet == 1:
worksheet = wks.add_worksheet("Second Sheet",1,8)
else:
print "Error: spreadsheet does not exist"
sys.exit(1)
worksheet.resize(1,8)
for i in csv:
line = i.split(",")
linelen = len(line)-1
if (counter3 > linelen):
counter3 = 0
if (counter1 > linelen):
counter1 = 1
if (DEBUG):
print "entry length (starting from 0): ", linelen
print "line: ", line
print "counter1: ", counter1
print "counter3: ", counter3
while (counter3<=linelen):
if (DEBUG):
print "writing line: ", line[counter3]
worksheet.update_cell(counter2, counter1, line[counter3].rstrip('\n'))
counter3 += 1
counter1 += 1
counter2 += 1
worksheet.resize(counter2,8)
我是系统管理员,所以我提前为糟糕的代码道歉。
无论如何,脚本将从csv逐行获取,用逗号分割并逐个单元地写入,因此编写它需要时间。我的想法是让cron每天执行一次,它将删除旧条目并编写新条目 - 这就是我使用resize()的原因。
现在,我想知道是否有更好的方法来获取整个csv行并将其写入工作表中,每个值都在其自己的单元格中,避免像我现在一样逐个单元地编写?这将大大减少执行它所需的时间。
谢谢!
答案 0 :(得分:3)
是的,这可以做到。我上传了100行12行的块,它处理得很好 - 我不知道这对于像一个整体csv一样的东西如何缩放。另请注意,工作表的默认长度为1000行,如果您尝试引用此范围之外的行,则会出现错误(因此请事先使用add_rows
以确保存在空间)。简化示例:
data_to_upload = [[1, 2], [3, 4]]
column_names = ['','A','B','C','D','E','F','G','H', 'I','J','K','L','M','N',
'O','P','Q','R','S','T','U','V','W','X','Y','Z', 'AA']
# To make it dynamic, assuming that all rows contain same number of elements
cell_range = 'A1:' + str(column_names[len(data_to_upload[0])]) + str(len(data_to_upload))
cells = worksheet.range(cell_range)
# Flatten the nested list. 'Cells' will not by default accept xy indexing.
flattened_data = flatten(data_to_upload)
# Go based on the length of flattened_data, not cells.
# This is because if you chunk large data into blocks, all excess cells will take an empty value
# Doing the other way around will get an index out of range
for x in range(len(flattened_data)):
cells[x].value = flattened_data[x].decode('utf-8')
worksheet.update_cells(cells)
如果您的行长度不同,那么显然您需要在cells
中插入适当数量的空字符串,以确保两个列表不会失去同步。我使用解码是为了方便,因为我一直在使用特殊字符崩溃,所以最好还是把它放进去。