我有一个python字典,我最终想要插入到mysql数据库中。我正在从名为“条目”的东西中解析数据,就像(#符号化数字):
entries = [ "['data'] runtime: ###, scan: ###",
"['data'] ctime: ###, scan: ###",
"['data'] runtime: ###", ... ]
“”中的每一项都是一个单独的条目。现在我使用正则表达式来提取与每个条目相关的运行时,ctimes和扫描,如下所示:
import re
terms = (["runtime", "runtime\s?:\s?(\d+)"],
["ctime", "ctime\s?:\s?(\d+)"],
["scan", "scan\s?:\s?(\d+)"])
d = {}
for i in range(len(terms)):
def getTerm(term, entries):
pattern = re.compile(term)
output = pattern.findall(str(entries))
return output
d[terms[i][0]] = getTerm(terms[i][1], entries)
这很有效 - 但是,正如您所看到的,并非所有条目都具有运行时,ctime和扫描。如果一个值没有出现在一个条目中,我希望它以[]或NULL(或None)的形式输入到我的字典中,因为将来如果我查看字典中每个键的特定#元素,我希望所有数据都与一个特定条目相关联。我希望我的字典看起来像这样:
d = {'ctime': [None, '###', None], 'runtime': ['###', None, '###'], 'scan': ['###', '###', None]}
我该怎么做?
答案 0 :(得分:1)
如果import re
entries = [ "['data'] runtime: ###, scan: ###",
"['data'] ctime: ###, scan: ###",
"['data'] runtime: ###" ]
allterms = (["runtime", "runtime\s?:\s?([a-zA-Z0-9_#]*)"],
["ctime", "ctime\s?:\s?([a-zA-Z0-9_#]*)"],
["scan", "scan\s?:\s?([a-zA-Z0-9_#]*)"])
terms = [allterms[i][0] for i in range(len(allterms))]
patterns = [allterms[i][1] for i in range(len(allterms))]
def get_terms(entry):
for i in range(len(terms)):
term = re.search(patterns[i], entry)
term = term.groups()[0] if term else None
d[terms[i]] += [term]
pass
d = {t: [] for t in allterms}
for entry in entries:
get_terms(entry)
是可能包含或不包含关键字的字符串列表,并且订单很重要,那么我们需要迭代这些条目:
第一个选项:
# pip install futures # if using Python 2
from concurrent.futures import ThreadPoolExecutor
d = {t: [] for t in allterms}
with ThreadPoolExecutor() as executor:
for entry in entries:
get_terms(entry)
async的第二个选项:
end()
编辑:与@Wynne聊天合作开发的解决方案:)
答案 1 :(得分:0)
re.findall()
会返回一个空列表([]
),因此您不需要空的回退。如果你想在没有找到任何条款的情况下None
,就像Brennan所说,用户findall(string) or None
。
考虑使用列表推导来遍历所有条目,并使用dict comprehension在同一条目上应用正则表达式模式并将结果保存在dict中。
import re
terms = (["runtime", re.compile("runtime\s?:\s?(\d+)")],
["ctime", re.compile("ctime\s?:\s?(\d+)")],
["scan", re.compile("scan\s?:\s?(\d+)")])
results = [{property: pattern.findall(entry) for property, pattern in terms} for entry in entries]
现在你有类似的东西:
[{"runtime": None, "scan": ###, "ctime": ###}, {"runtime": ###, "scan": ###, "ctime": ###}, {"runtime": ###, "scan": None, "ctime": None}, ...]
以上代码与以下代码相同(但性能更高):
results = []
for entry in entries:
entry_dict = {}
for term, regex_pattern in terms:
entry_dict[term] = regex_pattern.findall(entry) or None
results.append(entry_dict)