如何将分号分隔文件转换为嵌套dict?

时间:2016-12-20 18:10:52

标签: python dictionary

我正在尝试将分号分隔文件转换为嵌套字典。今天早上一直在研究这个问题并猜测我忽略了一些简单的事情:

输入(样本)

这实际上大约有200行。只是一个小样本。

key;name;desc;category;type;action;range;duration;skill;strain_mod;apt_bonus
ambiencesense;Ambience Sense;This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.;psi-chi;passive;automatic;self;constant;;0;
cogboost;Cognitive Boost;The async can temporarily elevate their cognitive performance.;psi-chi;active;quick;self;temp;;-1;{'COG': 5}

当前输出

[['key',
  'name',
  'desc',
  'category',
  'type',
  'action',
  'range',
  'duration',
  'skill',
  'strain_mod',
  'apt_bonus'],
 ['ambiencesense',
  'Ambience Sense',
  'This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.',
  'psi-chi',
  'passive',
  'automatic',
  'self',
  'constant',
  '',
  '0',
  ''],
 ['cogboost',
  'Cognitive Boost',
  'The async can temporarily elevate their cognitive performance.',
  'psi-chi',
  'active',
  'quick',
  'self',
  'temp',
  '',
  '-1',
  "{'COG': 5}"]]

期望输出

blahblah = {
     'ambiencesense': {
         'name': 'Ambiance Sense'
         'desc': 'This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.',
         'category': 'psi-chi',
         'type': 'passive',
         'action': 'automatic',
         'range': 'self',
         'duration': 'constant',
         'skill': '',
         'strain_mod': '0',
         'apt_bonus': '',
         },     
     'cogboost': {
         'name': 'Cognitive Boost'
         'desc': 'The async can temporarily elevate their cognitive performance.',
         'category': 'psi-chi',
         'type': 'active',
         'action': 'quick',
         'range': 'self',
         'duration': 'temp',
         'skill': '',
         'strain_mod': '-1',
         'apt_bonus': 'COG', 5',
         },
         ...

脚本(无功能)

#!/usr/bin/env python
# Usage: ./csvdict.py <filename to convert to dict> <file to output>

import csv
import sys
import pprint

def parse(filename):
    with open(filename, 'rb') as csvfile:
        dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';')
        csvfile.seek(0)
        reader = csv.reader(csvfile, dialect)
        dict_list = []

        for line in reader:
            dict_list.append(line)
        return dict_list

        new_dict = {}

        for item in dict_list:
            key = item.pop('key')
            new_dict[key] = item

output = parse(sys.argv[1])

with open(sys.argv[2], 'wt') as out:
    pprint.pprint(output, stream=out)

工作脚本

#!/usr/bin/env python
# Usage: ./csvdict.py <input filename> <output filename>

import sys
import pprint

file_name = sys.argv[1]
data = {}
error = 'Incorrect number of arguments.\nUsage: ./csvdict.py <input filename> <output filename>'

if len(sys.argv) != 3:
    print(error)
else:

    with open(file_name, 'r') as test_fh:
        header_line = next(test_fh)
        header_line = header_line.strip()
        headers = header_line.split(';')

        index_headers = {index:header for index, header in enumerate(headers)}

        for line in test_fh:
            line = line.strip()
            values = line.split(';')
            index_vals = {index:val for index, val in enumerate(values)}
            data[index_vals[0]] = {index_headers[key]:value for key, value in index_vals.items() if key != 0}

    with open(sys.argv[2], 'wt') as out:
        pprint.pprint(data, stream=out)

唯一不能很好处理的是嵌入式词条。任何想法如何清理这个? (见apt_bonus)

 'cogboost': {'action': 'quick',
              'apt_bonus': "{'COG': 5}",
              'category': 'psi-chi',
              'desc': 'The async can temporarily elevate their cognitive performance.',
              'duration': 'temp',
              'name': 'Cognitive Boost',
              'range': 'self',
              'skill': '',
              'strain_mod': '-1',
              'type': 'active'},

3 个答案:

答案 0 :(得分:2)

这是另一个版本,它有点抽象,但没有依赖性。

file_name = "<path>/test.txt"

data = {}
with open(file_name, 'r') as test_fh:
    header_line = next(test_fh)
    header_line = header_line.strip()
    headers = header_line.split(';')

    index_headers = {index:header for index, header in enumerate(headers)}

    for line in test_fh:
        line = line.strip()
        values = line.split(';')
        index_vals = {index:val for index, val in enumerate(values)}
        data[index_vals[0]] = {index_headers[key]:value for key, value in index_vals.items() if key != 0}

print(data)

答案 1 :(得分:1)

使用pandas非常容易:

In [7]: import pandas as pd

In [8]: pd.read_clipboard(sep=";", index_col=0).T.to_dict()
Out[8]:
{'ambiencesense': {'action': 'automatic',
  'apt_bonus': nan,
  'category': 'psi-chi',
  'desc': 'This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.',
  'duration': 'constant',
  'name': 'Ambience Sense',
  'range': 'self',
  'skill': nan,
  'strain_mod': 0,
  'type': 'passive'},
 'cogboost': {'action': 'quick',
  'apt_bonus': "{'COG': 5}",
  'category': 'psi-chi',
  'desc': 'The async can temporarily elevate their cognitive performance.',
  'duration': 'temp',
  'name': 'Cognitive Boost',
  'range': 'self',
  'skill': nan,
  'strain_mod': -1,
  'type': 'active'}}

在您的情况下,您使用的是pd.read_csv()而不是.read_clipboard(),但它看起来大致相同。如果要将apt_bonus列解析为字典,可能还需要稍微调整一下。

答案 2 :(得分:1)

尝试使用没有库的pythonic方式:

s = '''key;name;desc;category;type;action;range;duration;skill;strain_mod;apt_bonus
ambiencesense;Ambience Sense;This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.;psi-chi;passive;automatic;self;constant;;0;
cogboost;Cognitive Boost;The async can temporarily elevate their cognitive performance.;psi-chi;active;quick;self;temp;;-1;{'COG': 5}'''

lists = [delim.split(';') for delim in s.split('\n')]
keyIndex = lists[0].index('key')
nested = {lst[keyIndex]:{lists[0][i]:lst[i] for i in range(len(lists[0])) if i != keyIndex} for lst in lists[1:]}

结果与:

{
    'cogboost': {
        'category': 'psi-chi',
        'name': 'Cognitive Boost',
        'strain_mod': '-1',
        'duration': 'temp',
        'range': 'self',
        'apt_bonus': "{'COG': 5}",
        'action': 'quick',
        'skill': '',
        'type': 'active',
        'desc': 'The async can temporarily elevate their cognitive performance.'
    },
    'ambiencesense': {
        'category': 'psi-chi',
        'name': 'Ambience Sense',
        'strain_mod': '0',
        'duration': 'constant',
        'range': 'self',
        'apt_bonus': '',
        'action': 'automatic',
        'skill': '',
        'type': 'passive',
        'desc': 'This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.'
    }
}