不确定我要如何处理,但基本上我有项目列表
section = ['messages','ProcQueueLen']
或
section = ['messages','CpuError']
...取决于我们所在的部分...
以及procqueuelen部分中的某些数据点。
我想创建一个动态字典,以便可以将数据点(作为字典)添加到正确的字典条目中。例如:
<setup>
logfile = cdm.log
loglevel = 0
cpu_usage_includes_wait = yes
internal_alarm_message = InternalAlarm
mem_buffer_used = no
alarm_on_each_sample = no
qos_source_short = yes
trendsubject = cdm
trendpriority = information
paging_in_kilobytes = yes
post_install = 1382462705
allow_qos_source_as_target = no
monitor_iostat = yes
allow_remote_disk_info = yes
</setup>
<messages>
<ProcQueueLen>
text = Average ($value_number samples)
processor queue length is $value$unit, which is >= $value_limit$unit. Last value is $value_last$unit.
level = minor
token = proc_q_len
</ProcQueueLen>
<CpuError>
text = Average ($value_number samples) total cpu is now $value$unit, which is above the error threshold ($value_limit$unit)
level = major
token = cpu_error
i18n_token = as#system.cdm.avrg_total_cpu_above_err_threshold
</CpuError>
</messages>
将产生如下的嵌套字典:
conf = {'messages':{'ProcQueueLen':{'text':'Average ($value_number samples) processor queue length is $value$unit, which is >= $value_limit$unit. Last value is $value_last$unit.','level':'minor','token':'proc_q_len'},'CpuError':{'text':'Average ($value_number samples) total cpu is now $value$unit, which is above the error threshold ($value_limit$unit)','level':'major','token':'cpu_error','i18n_token':'as#system.cdm.avrg_total_cpu_above_err_threshold'}}}
我正在逐行读取具有这些不同部分的文件,并通过根据需要追加和弹出部分来设置条目进入的部分。但是我不确定如何根据此部分列表指定嵌套字典。
这是无效的xml,因为它没有正确的部分并且包含无效字符。我尝试了beautifulsoup,但是速度很慢。通过将数据放入嵌套字典中,对我来说导航会更快,更容易。
目前我唯一的代码如下:
conf = {}
section = []
for i, line in enumerate(out.split('\\n')):
l = line.strip()
if i < 20:
print(l)
if l.startswith('</'):
print('skipping')
elif l.startswith('<'):
conf[l] = {}
section.append(l)
print('create dbentry')
else:
conf[section][l.split('=')[0].strip()] = l.split('=')[1].strip()
print('add to dbentry')
这不起作用,因为在这种情况下,[节]需要是节的列表,并且不确定如何做到这一点。
@ Ajax1234,这就是我对您的解决方案的了解。
print([c for c in _r if c[0]])
[['\\n logfile', 'cdm.log\\n loglevel', '0\\n cpu_usage_includes_wait', 'yes\\n internal_alarm_message', 'InternalAlarm\\n mem_buffer_used', 'no\\n alarm_on_each_sample', 'no\\n qos_source_short', 'yes\\n trendsubject', 'cdm\\n trendpriority', 'information\\n paging_in_kilobytes', 'yes\\n post_install', '1382462705\\n allow_qos_source_as_target', 'no\\n monitor_iostat', 'yes\\n allow_remote_disk_info', 'yes\\n']]
print(dict([c for c in _r if c[0]]))
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
ValueError: dictionary update sequence element #0 has length 15; 2 is required
答案 0 :(得分:1)
如果您可以重新定义输入语法,建议您使用普通的.ini文件和Python的configparser。
我喜欢Ajax和Serge Ballista的答案,但是如果您想修改现有代码以使其正常工作,请尝试以下操作:
function fillTable() {
let tr = [];
let tbody = $('tbody');
for(let i = 0; i < 100; i++) {
let e = $('<tr><td>' + (i + 1) + '</td><td>Table row ' + (i + 1) + '</td><td>' + (Math.random() * 1000) + '</td></tr>');
// Removed for better performance
//tr.appendTo(tbody);
tr.push(e);
}
$.fn.append.apply(tbody, tr).html();
}
有了这个,以及您的输入,我得到以下输出:
import pprint
conf = {}
section = []
for i, line in enumerate(out.split('\n')):
l = line.strip()
if i < 20:
l = l.strip("\n")
if not l:
# skip if end of file
continue
if l.startswith('</'):
# we need to remove this from the list of current sections
section.pop()
print('skipping')
elif l.startswith('<'):
sec_name = l.strip("<>") # what you wanted was conf["messages"], not conf["<messages>"]
secstr = "".join(f"['{x}']" for x in section) # create a string that looks something like ['messages']['ProcQueueLen']
correct = eval(f"conf{secstr}") # use the string to evaluate to an actual section in your conf dict
correct[sec_name] = {} # set the new section to an empty dictionary
section.append(sec_name) # add the new section to the dictionary route
print(f"create dbentry: {secstr}['{sec_name}']")
else:
secstr = "".join(f"['{x}']" for x in section)
correct = eval(f"conf{secstr}")
# you have = in the middle of config values, which means that you can't split on '=', but you can split on ' = ' if your format is consistent.
correct[l.split(' = ')[0].strip()] = l.split(' = ')[1].strip()
print(f"add to dbentry: {correct[l.split(' = ')[0].strip()]}")
pprint.pprint(conf)
答案 1 :(得分:0)
假设没有任意的换行符,则可以对BeautifulSoup
使用递归:
from bs4 import BeautifulSoup as soup
import re, collections, functools
def parse(d):
_d = collections.defaultdict(dict)
for i in getattr(d, 'contents', []):
if isinstance(i, str) and i != '\n':
_r = [re.split('\s\=\s', c) for c in re.split('\n\s+', i)]
_d[d.name].update(dict([c for c in _r if c[0]]))
else:
_d[d.name].update(parse(i))
return _d
import json
result = functools.reduce(lambda x, y:{**x, **y}, [dict(parse(i)) for i in soup(data, 'html.parser').contents if not isinstance(i, str)])
print(json.dumps(result, indent=4))
输出:
{
"setup": {
"logfile": "cdm.log",
"loglevel": "0",
"cpu_usage_includes_wait": "yes",
"internal_alarm_message": "InternalAlarm",
"mem_buffer_used": "no",
"alarm_on_each_sample": "no",
"qos_source_short": "yes",
"trendsubject": "cdm",
"trendpriority": "information ",
"paging_in_kilobytes": "yes",
"post_install": "1382462705",
"allow_qos_source_as_target": "no",
"monitor_iostat": "yes",
"allow_remote_disk_info": "yes\n"
},
"messages": {
"procqueuelen": {
"text": "Average ($value_number samples) processor queue length is $value$unit, which is >= $value_limit$unit. Last value is $value_last$unit.",
"level": "minor",
"token": "proc_q_len"
},
"cpuerror": {
"text": "Average ($value_number samples) total cpu is now $value$unit, which is above the error threshold ($value_limit$unit)",
"level": "major",
"token": "cpu_error",
"i18n_token": "as#system.cdm.avrg_total_cpu_above_err_threshold"
}
}
}
答案 2 :(得分:0)
示例文本可以使用xml.etree
和re
模块进行解析,前提是要验证以下假设:
代码可能是:
def process_text(t):
def process_elt(elt, dic): # process the XML part
dic[elt.tag] = {}
dic = dic[elt.tag]
children = elt.getchildren()
if len(children) > 0:
for child in children:
process_elt(child, dic)
else:
process_txt(elt.text, dic)
def process_txt(t, dic): # process the textual part
blank = re.compile(r'\s+')
eq = re.compile(r'\s*([^=]*?)\s*=\s*(.*?)\s*$')
old = None
for line in io.StringIO(t):
# continuation line are not indented
if not blank.match(line) and old is not None:
dic[old] += ' ' + line
elif line.strip() != '': # skip empty line
m = eq.match(line)
if m is None:
print('ERROR', line)
old = m.group(1)
dic[old] = m.group(2)
conf = {}
root = ET.fromstring(t)
process_elt(root, conf)
return conf
使用您确切的输入文字,我得到:
{'messages': {'ProcQueueLen': {'text': 'Average ($value_number samples) processor queue length is $value$unit, which is >= $value_limit$unit. Last value is $value_last$unit.\n', 'level': 'minor', 'token': 'proc_q_len'}, 'CpuError': {'text': 'Average ($value_number samples) total cpu is now $value$unit, which is above the error threshold ($value_limit$unit)', 'level': 'major', 'token': 'cpu_error', 'i18n_token': 'as#system.cdm.avrg_total_cpu_above_err_threshold'}}}