嵌入式JSON字符串的python csv模块问题(Python + Oracle + CSV + JSON)

时间:2016-08-09 16:04:28

标签: python json python-3.x csv parsing

我在创建表格后立即将少量基线数据导入表格。只有一个表给我带来麻烦,这是因为其中一个字段是JSON。

我还没有找到能够正确解释JSON中的转义引号和逗号的语法引擎。我没有尝试过所有这些,当然,我可以根据任何类似问题的经验提出建议。

我不知道这是否重要,但我正在使用Toad for Oracle将CSV文件导出为开发重建数据的基准。 Toad没有选择替换CSV中的分隔符,虽然我不难手动更改单个CSV文件,因为维护任务将是PITA。

以下是导致问题的CSV数据示例:

"RULE_ID","NAME","DISPLAY_DESC","NOTES","RULE","SOURCE_ID","RULE_META","RULE_SCOPE","ACTIVE"
265.00,"RoadKill Report Processor","Report Processor","Loads a long-run-thread for each report matched by the handler method.","MvcsReportProcessManager",41.00,"{
                \"handler\"        : \"processReports\",
                \"consumer_prototype\" :
                \"RoadKill_report_processor.AssetDataReportHostConsumer\",
                \"match_expression\" : \"^MVCS_.*\",
                \"schedule\"    : [
                        \"0:30-4:00|mon-sun|*|*\",
                        \"!*|*|1|jan\",
                        \"!*|*|25|dec\",
                        \"!*|thu|22-28|nov\"
                    ],
                \"wake_interval\" : \"30m\",
                \"interval\"      : \"24h\"
            }","INST",0.00
321.00,"RoadKill AG Processor","Asset Group Reflection","Loads a long-run-thread to download Asset Groups daily.","MvcsAssetGroupDownloader",41.00,"{
                \"handler\"        : \"replicateAssetGroups\",
                \"consumer_prototype\" :
                \"RoadKill_report_processor.AssetGroupConsumer\",
                \"schedule\"    : [
                        \"00:30-17:00|mon-sun|*|*\",
                        \"!*|*|1|jan\",
                        \"!*|*|25|dec\",
                        \"!*|thu|22-28|nov\"
                    ],
                \"wake_interval\" : \"30m\",
                \"interval\"      : \"24h\"
            }","INST",1.00
322.00,"RoadKill Asset Processor","Asset Reflection","Loads a long-run-thread to download Assets daily.","MvcsAssetAPIHostDownloader",41.00,"{
                \"handler\"        : \"replicateAssets\",
                \"consumer_prototype\" :
                \"RoadKill_report_processor.\",
                \"schedule\"    : [
                        \"00:30-17:00|mon-sun|*|*\",
                        \"!*|*|1|jan\",
                        \"!*|*|25|dec\",
                        \"!*|thu|22-28|nov\"
                    ],
                \"wake_interval\" : \"30m\",
                \"interval\"      : \"24h\"
            }","INST",1.00
323.00,"RoadKill Vuln Processor","Vuln Reflection","Loads a long-run-thread to download Vulns daily.","MvcsAssetAPIVulnDownloader",41.00,"{
                \"handler\"        : \"replicateVulns\",
                \"consumer_prototype\" :
                \"RoadKill_report_processor.AssetAPIHostDetectionConsumer\",
                \"schedule\"    : [
                        \"00:30-17:00|mon-sun|*|*\",
                        \"!*|*|1|jan\",
                        \"!*|*|25|dec\",
                        \"!*|thu|22-28|nov\"
                    ],
                \"wake_interval\" : \"30m\",
                \"interval\"      : \"24h\"
            }","INST",1.00
141.00,"RoadKill Manager","RoadKill Sync","Loads RoadKill instances and dispatches an entry point for that source + instance (one for each instance rule).","MvcsInstanceDispatchRule",41.00,"{
            \"handler\"        : \"startInstanceRules\",
            \"schedule\"    : [
                    \"0:00-23:59|mon-sun|*|*\"
                ],
            \"wake_interval\" : \"30m\"
        }","CORE",1.00

以下是python csv模块在尝试解析行时返回的行:

>>> [(o,v) for o,v in enumerate(row)]
[(0, '265.00'), (1, 'RoadKill Report Processor'), (2, 'Report Processor'), (3, 'Loads a long-run-thread for each report matched by the handler method.'), (4, 'MvcsReportProcessManager'), (5, '41.00'), (6, '{\n                \\handler\\"        : \\"processReports\\"'), (7, '')]

最后,这是csv阅读器代码:

col_offsets = None
for f in os.listdir(testdatadir):
    #split filename.  get tablename.
    fname = os.path.basename(f)
    if fname and\
            fname.startswith('mvcs_') and\
            fname.endswith('.csv'):
        tblname = fname.split('.')[0]
        tobj = get_class_by_tablename(tblname)
        with open(testdatadir+'/'+fname, 'r') as csvfile:
            csvreader = csv.reader(csvfile, delimiter=',',
                    quotechar='"')
            for count,row in enumerate(csvreader):
                if not count:
                    col_offsets = getColumnOffsets(row)
                elif not col_offsets:
                    raise Exception('Missing column offsets.')
                else:
                    tinst = tobj(
                        **{colname.lower() : row[offset] for
                            offset,colname in col_offsets})
                    try:
                        session.add(tinst)
                    except Exception as e:
                        logger.warn(str(e))
                        logger.warn('on adding:')
                        logger.warn(str(tinst))

1 个答案:

答案 0 :(得分:0)

第69-70行(来自https://docs.python.org/3.5/library/csv.html#dialects-and-formatting-parameters)方言。 Escapechar默认设置为None。

修改为反斜杠

csvreader = csv.reader(csvfile, delimiter=',',
    quotechar='"', dialect='unix', escapechar='\\')