这是我将.csv文件中的所有数据转储到mongodb的代码。奇怪的是,它在我的Mac上运行得非常好但是当我将此代码上传到运行ubuntu 12.04.3 LTS的Windows Azure时,只有主代码被执行而函数没有被调用。这是我正在使用的代码
import csv,json,glob,traceback
from pymongo import MongoClient
import datetime
import sys
import string
def make_document(column_headers,columns,timestamps):
#assert len(column_headers)==len(columns)
lotr = filter(lambda x: x[0] is not None,zip(column_headers,columns))
final = []
#print lotr
if not timestamps=={}:
for k,v in lotr:
try:
tformat = timestamps[k]
time_val = datetime.datetime.strptime(v,tformat)
final.append((k,time_val))
except KeyError:
final.append((k,v))
return dict(final)
else:
return dict(lotr)
def keep_printable_only(s):
return filter(lambda x: x in string.printable,s)
def perform(conf):
client = MongoClient(conf["server"],conf["port"])
db = client[conf["db"]]
collection = db[conf["collection"]]
files = glob.glob(conf["data_form"])
column_headers = conf["columns"]
csv_opts = {}
for k,v in conf["csv_options"].items():
csv_opts[str(k)]=str(v)
for infile in files:
#print conf["csv_options"]
inCSV = csv.reader(open(infile,'rU'),**csv_opts)
counter = 0
for record in inCSV:
yield record
counter +=1
if counter==2:
print record
#sys.exit(0)
record= map(keep_printable_only,record)
try:
doc = make_document(column_headers,record,conf["timestamp_columns"])
collection.insert(doc)
except :
print "error loading one of the lines : "
print traceback.format_exc()
if __name__=='__main__':
print"reads all data files of same format as given in column mapping and dumps them to a mongo collection"
print "uses conf.json.test as config file"
conf = json.load(open('./conf.json.txt'))
for row in perform(conf):
record= map(keep_printable_only,row)
当我在Azure上运行时,不会创建mongo集合,并且在主代码中打印两行后代码终止。我不知道为什么会这样。
答案 0 :(得分:0)
调试输出非常有用,以堆栈跟踪的形式,如@Alfe所评论。
除此之外,您的代码似乎停在您尝试访问本地文件以读取配置的行。确保您可以在Azure中以这种方式访问文件系统;有时,提供商会在您的代码和实际机器之间设置非常严格的隔离墙。
您可以使用以下方法使代码更具便携性:
import os
import os.path
conf_filehandle = open(os.path.join(os.getcwd(), 'conf.json'))
conf = json.load(conf_filehandle)
当然,您还应确保已将JSON文件上载到Azure:)