从任意嵌套字典中清除字符(来自JSON)

时间:2016-02-22 16:57:05

标签: python json dictionary recursion

我想清除来自dictionary对象的json以删除所有\n|字符,以便我可以使用csv DictWriter将其写成平面文件中的一行,以便复制到AWS数据库中。我以前从未在dict对象上使用过递归,而且我正在努力弄清楚如何有效地遍历所有级别,直到它们是单个字符串,然后迭代一个项目列表我要替换。我的代码目前正在收到IndexError,说我的字符串索引超出了范围。这是我的功能:

def purge_items(in_iter, items):
    if isinstance(in_iter, dict):
        for k, v in in_iter:
            if isinstance(v, dict):
                purge_items(k[v], items)
    elif isinstance(in_iter, list):
        for item in items:
            for elem in in_iter:
                try:
                    elem.replace(item[0], item[1])
                except AttributeError:
                    continue
    else:
        try:
            for item in items:
                in_iter.replace(item[0], item[1])
        except AttributeError:
            return

这个函数期待一个字典(在我用字典弄清楚之后我想让它更通用地接受任何可变的)具有任意嵌套长度,然后是你想要替换的项目列表,如下所示(' \ n',''),其中第二个条目是您要替换它的内容。

我正在使用的数据示例如下,其中包含换行符:

{'issuetype': {'avatarId': 22101,
                                      'description': 'A problem found in '
                                                     'production which impairs '
                                                     'or prevents the '
                                                     'functions of the '
                                                     'product.',
                                      'iconUrl': 'https://instructure.atlassian.net/secure/viewavatar?size=xsmall&avatarId=22101&avatarType=issuetype',
                                      'id': '1',
                                      'name': 'Bug',
                                      'self': 'https://instructure.atlassian.net/rest/api/2/issuetype/1',
                                      'subtask': False}}

1 个答案:

答案 0 :(得分:1)

好的,一般处理和播放文本的模块很多,仅举几例:

  • ast和ast.literal_eval()
  • textwrap,它是textwrap.dedent()
  • JS​​ON

但在你的情况下简单:

test = """
    {'issuetype': {'avatarId': 22101,
                                      'description': 'A problem found in '
                                                     'production which impairs '
                                                     'or prevents the '
                                                     'functions of the '
                                                     'product.',
                                      'iconUrl': 'https://instructure.atlassian.net/secure/viewavatar?size=xsmall&avatarId=22101&avatarType=issuetype',
                                      'id': '1',
                                      'name': 'Bug',
                                      'self': 'https://instructure.atlassian.net/rest/api/2/issuetype/1',
                                      'subtask': False}
                                      }
    """

print ("".join([obj.strip().replace('|', '') for obj in test.split("\n")]))

输出

{'issuetype': {'avatarId': 22101,'description': 'A problem found in ''production which impairs ''or prevents the ''functions of the ''product.','iconUrl': 'https://instructure.atlassian.net/secure/viewavatar?size=xsmall&avatarId=22101&avatarType=issuetype','id': '1','name': 'Bug','self': 'https://instructure.atlassian.net/rest/api/2/issuetype/1','subtask': False}}

应该就够了,是吗?

哎呀,不完全,双重“''”也需要删除 - 更正版本:

test_1 = "".join([obj.strip().replace('|', '') 
                 for obj in test.split("\n")])
test_2 = test_1.replace("''", "")
print (test_2)

输出

{'issuetype': {'avatarId': 22101,'description': 'A problem found in production which impairs or prevents the functions of the product.','iconUrl': 'https://instructure.atlassian.net/secure/viewavatar?size=xsmall&avatarId=22101&avatarType=issuetype','id': '1','name': 'Bug','self': 'https://instructure.atlassian.net/rest/api/2/issuetype/1','subtask': False}}