将重复字段的值放在数组中

时间:2019-05-23 06:29:20

标签: python

我有这种格式的文件,

dn: abvf
changetype: a
objectclass: in
objectclass: c
objectclass: cdsUser
objectclass: or
objectclass: Person
objectclass: nd
objectclass: Top
ACL :HH
ACL: JJJ
`
`
dn: abvf
changetype: a
objectclass: in
objectclass: c
objectclass: cdsUser
objectclass: or
objectclass: Person
objectclass: nd
objectclass: Top
ACL :HH
ACL: JJJ

您如何生成这样的文件

dn: abvf
changetype: a
objectclass: ['','','','']
ACL :['','']
`
`
dn: abvf
changetype: a
objectclass: ['','','','']
ACL :['','']

基本上我想解析此文件,如果它具有多次相同的字段, 将其值存储在一个数组中(无需硬编码),因为我有很多类似的条目,例如具有不同的重复字段。

有没有办法实现这一目标,请帮帮我。

3 个答案:

答案 0 :(得分:0)

这是我在上面的评论中提到的示例。我假设换行符(\n)分隔两个不同的对象。您最终获得了一份字典列表,然后可以根据需要使用该列表进行打印/书写。请注意,pprint仅用于更好的打印,并且在写入文件时实际上并不需要它。

import pprint

if __name__ == '__main__':
    obj_list = []
    pp = pprint.PrettyPrinter(indent=4)
    with open('input.txt', 'r') as input_file:
        temp_dict = {}
        for line in input_file:
            if line == '\n':
                # empty dict evaluates to false
                if temp_dict:
                    obj_list.append(temp_dict)
                    temp_dict = {}
            else:
                k, v = line.strip().split(':')
                k, v = k.strip(), v.strip()

                if k in temp_dict.keys():
                    if not isinstance(temp_dict[k], list):
                        old_value_as_list = [temp_dict[k]]
                        temp_dict[k] = old_value_as_list
                    temp_dict[k].append(v)
                else:
                    temp_dict[k] = v

        # since file may not end with newline
        if temp_dict:
            obj_list.append(temp_dict)
            temp_dict = {}

        pp.pprint(obj_list)

        with open('output.txt', 'w') as output_file:
            for obj in obj_list:
                for k,v in obj.items():
                    output_file.write(f'{k}: {v}\n')
                output_file.write('\n')

输出:

[   {   'ACL': ['HH', 'JJJ'],
        'changetype': 'a',
        'dn': 'abvf',
        'objectclass': ['in', 'c', 'cdsUser', 'or', 'Person', 'nd', 'Top']},
    {   'ACL': ['HH', 'JJJ'],
        'changetype': 'a',
        'dn': 'abvf',
        'objectclass': ['in', 'c', 'cdsUser', 'or', 'Person', 'nd', 'Top']}]

答案 1 :(得分:0)

with open("uservolvo2.ldif") as f:

count_dict={}
count_list=[]

for line in f:
    if line !="\n":
        split = line.split(":")
        json_obj = {split[0].rstrip("\n"):[split[1].rstrip("\n")]}
        if split[0].rstrip("\n") in count_dict.keys():
            count_dict[split[0].rstrip("\n")].append(split[1].rstrip("\n"))
        count_dict.update(json_obj) 

    else:
        count_list.append(count_dict)
        count_dict={}
        count_list=[]

with open("uservolvo3.ldif") as f1:
    for obj in count_list:
        for k, v in obj.items():
            f1.write(k, ':' ,v)

我尝试了这段代码,但这并没有在新文件中打印任何内容

答案 2 :(得分:0)

您可以为每个块构建一个字典,并使用它来积累每个关键字的重复值:

with open('input.txt', 'r') as inFile:
    lines = inFile.read().split("\n")

with open('ouptput.txt','w') as outFile:
    block = dict()
    for line in lines+[""]:
        if line in ["`",""]:
            outLines = [f"{k}:{[v[0],v][len(v)>1]}" for k,v in block.items()]
            outFile.write("\n".join(outLines+[line])+"\n")
            block = dict()
            continue
        keyword,value = line.split(":",1)
        block.setdefault(keyword,list()).append(value.strip())