将空值插入python字典中

时间:2017-08-29 18:24:42

标签: python dictionary

我有一个python字典,我最终想要插入到mysql数据库中。我正在从名为“条目”的东西中解析数据,就像(#符号化数字):

entries = [ "['data'] runtime: ###, scan: ###", 
            "['data'] ctime: ###, scan: ###", 
            "['data'] runtime: ###", ... ]

“”中的每一项都是一个单独的条目。现在我使用正则表达式来提取与每个条目相关的运行时,ctimes和扫描,如下所示:

import re
terms = (["runtime", "runtime\s?:\s?(\d+)"],
         ["ctime", "ctime\s?:\s?(\d+)"],
         ["scan", "scan\s?:\s?(\d+)"])
d = {}
for i in range(len(terms)):
    def getTerm(term, entries):
        pattern = re.compile(term)
        output = pattern.findall(str(entries))
        return output
    d[terms[i][0]] = getTerm(terms[i][1], entries)

这很有效 - 但是,正如您所看到的,并非所有条目都具有运行时,ctime和扫描。如果一个值没有出现在一个条目中,我希望它以[]或NULL(或None)的形式输入到我的字典中,因为将来如果我查看字典中每个键的特定#元素,我希望所有数据都与一个特定条目相关联。我希望我的字典看起来像这样:

d = {'ctime': [None, '###', None], 'runtime': ['###', None, '###'], 'scan': ['###', '###', None]}

我该怎么做?

2 个答案:

答案 0 :(得分:1)

如果import re entries = [ "['data'] runtime: ###, scan: ###", "['data'] ctime: ###, scan: ###", "['data'] runtime: ###" ] allterms = (["runtime", "runtime\s?:\s?([a-zA-Z0-9_#]*)"], ["ctime", "ctime\s?:\s?([a-zA-Z0-9_#]*)"], ["scan", "scan\s?:\s?([a-zA-Z0-9_#]*)"]) terms = [allterms[i][0] for i in range(len(allterms))] patterns = [allterms[i][1] for i in range(len(allterms))] def get_terms(entry): for i in range(len(terms)): term = re.search(patterns[i], entry) term = term.groups()[0] if term else None d[terms[i]] += [term] pass d = {t: [] for t in allterms} for entry in entries: get_terms(entry) 是可能包含或不包含关键字的字符串列表,并且订单很重要,那么我们需要迭代这些条目:

第一个选项:

# pip install futures  # if using Python 2 
from concurrent.futures import ThreadPoolExecutor

d = {t: [] for t in allterms}
with ThreadPoolExecutor() as executor:
    for entry in entries:
        get_terms(entry)

async的第二个选项:

end()

编辑:与@Wynne聊天合作开发的解决方案:)

答案 1 :(得分:0)

当找不到匹配项时,

re.findall()会返回一个空列表([]),因此您不需要空的回退。如果你想在没有找到任何条款的情况下None,就像Brennan所说,用户findall(string) or None

考虑使用列表推导来遍历所有条目,并使用dict comprehension在同一条目上应用正则表达式模式并将结果保存在dict中。

import re
terms = (["runtime", re.compile("runtime\s?:\s?(\d+)")],
         ["ctime", re.compile("ctime\s?:\s?(\d+)")],
         ["scan", re.compile("scan\s?:\s?(\d+)")])
results = [{property: pattern.findall(entry) for property, pattern in terms} for entry in entries]

现在你有类似的东西:

[{"runtime": None, "scan": ###, "ctime": ###}, {"runtime": ###, "scan": ###, "ctime": ###}, {"runtime": ###, "scan": None, "ctime": None}, ...]

以上代码与以下代码相同(但性能更高):

results = []
for entry in entries:
    entry_dict = {}
    for term, regex_pattern in terms:
        entry_dict[term] = regex_pattern.findall(entry) or None
    results.append(entry_dict)