我在创建表格后立即将少量基线数据导入表格。只有一个表给我带来麻烦,这是因为其中一个字段是JSON。
我还没有找到能够正确解释JSON中的转义引号和逗号的语法引擎。我没有尝试过所有这些,当然,我可以根据任何类似问题的经验提出建议。
我不知道这是否重要,但我正在使用Toad for Oracle将CSV文件导出为开发重建数据的基准。 Toad没有选择替换CSV中的分隔符,虽然我不难手动更改单个CSV文件,因为维护任务将是PITA。
以下是导致问题的CSV数据示例:
"RULE_ID","NAME","DISPLAY_DESC","NOTES","RULE","SOURCE_ID","RULE_META","RULE_SCOPE","ACTIVE"
265.00,"RoadKill Report Processor","Report Processor","Loads a long-run-thread for each report matched by the handler method.","MvcsReportProcessManager",41.00,"{
\"handler\" : \"processReports\",
\"consumer_prototype\" :
\"RoadKill_report_processor.AssetDataReportHostConsumer\",
\"match_expression\" : \"^MVCS_.*\",
\"schedule\" : [
\"0:30-4:00|mon-sun|*|*\",
\"!*|*|1|jan\",
\"!*|*|25|dec\",
\"!*|thu|22-28|nov\"
],
\"wake_interval\" : \"30m\",
\"interval\" : \"24h\"
}","INST",0.00
321.00,"RoadKill AG Processor","Asset Group Reflection","Loads a long-run-thread to download Asset Groups daily.","MvcsAssetGroupDownloader",41.00,"{
\"handler\" : \"replicateAssetGroups\",
\"consumer_prototype\" :
\"RoadKill_report_processor.AssetGroupConsumer\",
\"schedule\" : [
\"00:30-17:00|mon-sun|*|*\",
\"!*|*|1|jan\",
\"!*|*|25|dec\",
\"!*|thu|22-28|nov\"
],
\"wake_interval\" : \"30m\",
\"interval\" : \"24h\"
}","INST",1.00
322.00,"RoadKill Asset Processor","Asset Reflection","Loads a long-run-thread to download Assets daily.","MvcsAssetAPIHostDownloader",41.00,"{
\"handler\" : \"replicateAssets\",
\"consumer_prototype\" :
\"RoadKill_report_processor.\",
\"schedule\" : [
\"00:30-17:00|mon-sun|*|*\",
\"!*|*|1|jan\",
\"!*|*|25|dec\",
\"!*|thu|22-28|nov\"
],
\"wake_interval\" : \"30m\",
\"interval\" : \"24h\"
}","INST",1.00
323.00,"RoadKill Vuln Processor","Vuln Reflection","Loads a long-run-thread to download Vulns daily.","MvcsAssetAPIVulnDownloader",41.00,"{
\"handler\" : \"replicateVulns\",
\"consumer_prototype\" :
\"RoadKill_report_processor.AssetAPIHostDetectionConsumer\",
\"schedule\" : [
\"00:30-17:00|mon-sun|*|*\",
\"!*|*|1|jan\",
\"!*|*|25|dec\",
\"!*|thu|22-28|nov\"
],
\"wake_interval\" : \"30m\",
\"interval\" : \"24h\"
}","INST",1.00
141.00,"RoadKill Manager","RoadKill Sync","Loads RoadKill instances and dispatches an entry point for that source + instance (one for each instance rule).","MvcsInstanceDispatchRule",41.00,"{
\"handler\" : \"startInstanceRules\",
\"schedule\" : [
\"0:00-23:59|mon-sun|*|*\"
],
\"wake_interval\" : \"30m\"
}","CORE",1.00
以下是python csv模块在尝试解析行时返回的行:
>>> [(o,v) for o,v in enumerate(row)]
[(0, '265.00'), (1, 'RoadKill Report Processor'), (2, 'Report Processor'), (3, 'Loads a long-run-thread for each report matched by the handler method.'), (4, 'MvcsReportProcessManager'), (5, '41.00'), (6, '{\n \\handler\\" : \\"processReports\\"'), (7, '')]
最后,这是csv阅读器代码:
col_offsets = None
for f in os.listdir(testdatadir):
#split filename. get tablename.
fname = os.path.basename(f)
if fname and\
fname.startswith('mvcs_') and\
fname.endswith('.csv'):
tblname = fname.split('.')[0]
tobj = get_class_by_tablename(tblname)
with open(testdatadir+'/'+fname, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',',
quotechar='"')
for count,row in enumerate(csvreader):
if not count:
col_offsets = getColumnOffsets(row)
elif not col_offsets:
raise Exception('Missing column offsets.')
else:
tinst = tobj(
**{colname.lower() : row[offset] for
offset,colname in col_offsets})
try:
session.add(tinst)
except Exception as e:
logger.warn(str(e))
logger.warn('on adding:')
logger.warn(str(tinst))
答案 0 :(得分:0)
第69-70行(来自https://docs.python.org/3.5/library/csv.html#dialects-and-formatting-parameters)方言。 Escapechar默认设置为None。
修改为反斜杠
csvreader = csv.reader(csvfile, delimiter=',',
quotechar='"', dialect='unix', escapechar='\\')