将文件内容读入dict python

时间:2019-04-05 12:12:36

标签: python

我有一个像这样的文件:

main()

我想将信息读入`dict如下:

...previous file content

[NON-UNIFORM LOADS]
    3 = number of items
Load 1
           0        17.50        20.00   0            0  = Time, Gamma dry, Gamma wet, Temporary, Endtime
    6 = Number of co-ordinates
       0.000        0.000 = X, Y
      20.000        0.000 = X, Y
      40.000        2.000 = X, Y
      80.000        2.000 = X, Y
     100.000        0.000 = X, Y
     120.000        0.000 = X, Y
Compensation load
         200        17.50        20.00   0            0  = Time, Gamma dry, Gamma wet, Temporary, Endtime
   19 = Number of co-ordinates
      20.000        0.000 = X, Y
      20.000        1.198 = X, Y
      25.000        2.763 = X, Y
      30.000        3.785 = X, Y
      35.000        4.617 = X, Y
      40.000        5.324 = X, Y
      45.000        5.418 = X, Y
      50.000        5.454 = X, Y
      55.000        5.467 = X, Y
      60.000        5.471 = X, Y
      65.000        5.467 = X, Y
      70.000        5.454 = X, Y
      75.000        5.418 = X, Y
      80.000        5.324 = X, Y
      85.000        4.617 = X, Y
      90.000        3.785 = X, Y
      95.000        2.763 = X, Y
     100.000        1.198 = X, Y
     100.000        0.000 = X, Y
Compensation load 2
         200        17.50        20.00   0            0  = Time, Gamma dry, Gamma wet, Temporary, Endtime
    3 = Number of co-ordinates
       0.000        0.000 = X, Y
      20.000       10.000 = X, Y
      20.000        0.000 = X, Y
[END OF NON-UNIFORM LOADS]

... subsequent file content

是否有任何第三方库可以帮助我解决此问题?否则,您将使用什么策略来解决这个问题?我首先在文件对象上使用result = {'no items':3, 'Load 1':{X:[0,20,40,80,100,120], Y = [0,0,2,2,0,0]}, 'Compensation Load':{...}, 'Compensation load 2':{...}} 方法,循环浏览各行,并使用readlines语句停止该行包含if的位置,但是我不确定如何处理那里有一个优雅的解决方案。...

编辑

为了回应评论,我正在尝试类似的事情:

'[NON-UNIFORM LOADS]'

3 个答案:

答案 0 :(得分:2)

您在这里。为了使事情更简单,我最终根本不使用正则表达式。实际上,到目前为止,我看过的文件示例还不够复杂,不足以值得使用。如果文件的其他部分可以具有更复杂的结构,则可能更值得。

我也不确定您使用的是Python 3还是Python 2,因此我尝试以使其可同时使用的方式编写它:

from collections import defaultdict

class ParseLoadsError(Exception):
    """Exception raised for malformatted load files."""

    def __init__(self, lineno, line, message):
        super(ParseLoadsError, self).__init__(lineno, line, message)

    def __str__(self):
        return 'parse error on line {}: {!r}; {}'.format(*self.args)


def parse_loads_file(fileobj):
    """Parse a <whatever> file.

    Currently just returns non-uniform loads.  Parsing other
    file sections is left as an exercise.
    """

    result = {'non_uniform_loads': []}

    line_iterator = ((idx, l.strip()) for idx, l in enumerate(fileobj))
    for lineno, line in line_iterator:
        line = line.strip()
        if line == '[NON-UNIFORM LOADS]':
            # Read the enter [NON-UNIFORM LOADS] section
            # We pass it line_iterator so it advances the
            # same iterator while reading
            result['non_uniform_loads'].append(_parse_non_uniform_loads(line_iterator))

    return result


def _parse_variable_map(lineno, line):
    """Parse a single <values> = <varnames> mapping.

    This file format uses a format for mapping one or more values
    to one or more variable names in the format::

        N_1 N_2 N_3 ... N_n = A_1, A_2, A_33, ..., A_n

    Where N_i are always either integers or floating-point values, and 
    A_i is the variable name associated with A_i.  The A_i may contain
    spaces, but whitespace is otherwise irrelevant.

    Of course, if other types of values may occur in other sections of
    the file this may be slightly more complicated.  This also assumes
    these lines are always well-formed.  If not, additional logic may be
    required to handle misshapen variables maps.
    """

    try:
        values, varnames = line.split('=')
        values = (float(v.strip()) for v in values.split())
        varnames = (n.strip() for n in varnames.split(','))
        return dict(zip(varnames, values))
    except ValueError:
        raise
        raise ParseLoadsError(lineno, line,
            "expected format N_1 N_2 ... N_n = A_1, A_2, ..., A_n")


def _parse_non_uniform_loads(lines):
    lineno, line = next(lines)
    # The first line of a non-uniform loads section
    # describes the number of loads
    try:
        n_loads = int(_parse_variable_map(lineno, line)['number of items'])
    except KeyError:
        raise ParseLoadsError(lineno, line, "expected 'N = number of items'")

    # Parse loads returns a load_name/load_data, tuple so this returns
    # a dict mapping load_name to load_data for each load
    loads = dict(_parse_load(lines) for _ in range(n_loads))

    lineno, line = next(lines)
    if line != '[END OF NON-UNIFORM LOADS]':
        raise ParseLoadsError(lineno, line, "expected '[END OF NON-UNIFORM LOADS]'")

    return loads


def _parse_load(lines):
    """Parses a single load section."""

    _, load_name = next(lines)

    # Next there appears some additional metadata about the load
    load_data = _parse_variable_map(*next(lines))

    # Then the number of coordinates
    lineno, line = next(lines)
    try:
        n_coords = int(_parse_variable_map(lineno, line)['Number of co-ordinates'])
    except KeyError:
        raise ParseLoadsError(lineno, line, "expected 'N = Number of co-ordinates'")

    coordinates = defaultdict(list)
    for _ in range(n_coords):
        for c, v in _parse_variable_map(*next(lines)).items():
            coordinates[c].append(v)

    load_data['Coordinates'] = dict(coordinates)
    return load_name, load_data                

用法示例:

try:
    from cStringIO import StringIO
except ImportError:
    from io import StringIO

example_file = StringIO("""...previous file content

[NON-UNIFORM LOADS]
    3 = number of items
Load 1
           0        17.50        20.00   0            0  = Time, Gamma dry, Gamma wet, Temporary, Endtime
    6 = Number of co-ordinates
       0.000        0.000 = X, Y
      20.000        0.000 = X, Y
      40.000        2.000 = X, Y
      80.000        2.000 = X, Y
     100.000        0.000 = X, Y
     120.000        0.000 = X, Y
Compensation load
         200        17.50        20.00   0            0  = Time, Gamma dry, Gamma wet, Temporary, Endtime
   19 = Number of co-ordinates
      20.000        0.000 = X, Y
      20.000        1.198 = X, Y
      25.000        2.763 = X, Y
      30.000        3.785 = X, Y
      35.000        4.617 = X, Y
      40.000        5.324 = X, Y
      45.000        5.418 = X, Y
      50.000        5.454 = X, Y
      55.000        5.467 = X, Y
      60.000        5.471 = X, Y
      65.000        5.467 = X, Y
      70.000        5.454 = X, Y
      75.000        5.418 = X, Y
      80.000        5.324 = X, Y
      85.000        4.617 = X, Y
      90.000        3.785 = X, Y
      95.000        2.763 = X, Y
     100.000        1.198 = X, Y
     100.000        0.000 = X, Y
Compensation load 2
         200        17.50        20.00   0            0  = Time, Gamma dry, Gamma wet, Temporary, Endtime
    3 = Number of co-ordinates
       0.000        0.000 = X, Y
      20.000       10.000 = X, Y
      20.000        0.000 = X, Y
[END OF NON-UNIFORM LOADS]

... subsequent file content""")

# To use an actual file here you might do something like
# with open(filename) as fobj:
#     parse_loads_file(fobj)

parse_loads_file(example_file)

输出:

{'non_uniform_loads': [{'Compensation load': {'Coordinates': {'X': [20.0,
      20.0,
      25.0,
      30.0,
      35.0,
      40.0,
      45.0,
      50.0,
      55.0,
      60.0,
      65.0,
      70.0,
      75.0,
      80.0,
      85.0,
      90.0,
      95.0,
      100.0,
      100.0],
     'Y': [0.0,
      1.198,
      2.763,
      3.785,
      4.617,
      5.324,
      5.418,
      5.454,
      5.467,
      5.471,
      5.467,
      5.454,
      5.418,
      5.324,
      4.617,
      3.785,
      2.763,
      1.198,
      0.0]},
    'Endtime': 0.0,
    'Gamma dry': 17.5,
    'Gamma wet': 20.0,
    'Temporary': 0.0,
    'Time': 200.0},
   'Compensation load 2': {'Coordinates': {'X': [0.0, 20.0, 20.0],
     'Y': [0.0, 10.0, 0.0]},
    'Endtime': 0.0,
    'Gamma dry': 17.5,
    'Gamma wet': 20.0,
    'Temporary': 0.0,
    'Time': 200.0},
   'Load 1': {'Coordinates': {'X': [0.0, 20.0, 40.0, 80.0, 100.0, 120.0],
     'Y': [0.0, 0.0, 2.0, 2.0, 0.0, 0.0]},
    'Endtime': 0.0,
    'Gamma dry': 17.5,
    'Gamma wet': 20.0,
    'Temporary': 0.0,
    'Time': 0.0}}]}

我不确定单个文件是否可以包含多个[NON-UNIFORM LOADS]部分,因此我将每个此类部分的内容附加到列表({'non_uniform_loads': [])中。但是,如果只有一个,则可以删除列表,而只需设置result['non_uniform_loads'] = _parse_non_uniform_loads(line_iterator)

答案 1 :(得分:-1)

我会使用这种方法:

s = '200        17.50        20.00   0            0  = Time, Gamma dry, Gamma wet, Temporary, Endtime' 
  1. 每个字符串之间用“ =”符号

    s_l = s.split('=')

  2. 由''分隔符

    s1 = [如果s_l [0] .split('')中a为| float(a.lstrip())

    s2 = [如果s_l [1] .split(',')中的a为a.lstrip(),如果!=”]

  3. 将结果列表压缩到字典

    target_dict = dict(zip(s2,s1))

结果:

target_dict: {'Time': 200.0, 'Gamma dry': 17.5, 'Gamma wet': 20.0, 'Temporary': 0.0, 'Endtime': 0.0}
  1. 组合字典

答案 2 :(得分:-1)

这是一个令人作呕的解决方案。

with open(file) as fo:
    lines = fo.readlines()
    results = {}
    for i, line in enumerate(lines):
        if r'[NON-UNIFORM LOADS]' in line:
            results['non_uniform_loads'] = {}
            #get load names and no_coordinates
            no_coords = []
            load_names = []
            load_names_index = []
            j=1
            line = lines[i+j]
            while '[' not in line:
                j=j+1 
                if 'Number of co-ordinates' in line:
                    no_coords.append(int(line.strip().split()[0]))
                elif str_is_float(line.strip().split()[0])==False:
                    load_names.append(line.strip().replace('\n', ''))
                    load_names_index.append(i+j-1)
                else:
                    pass
                line = lines[i+j]
            for j, load_name_index in enumerate(load_names_index):
                results['non_uniform_loads'][load_names[j]] = {'X':[], 'Z':[]}
                current_no_coords = no_coords[j]
                print current_no_coords
                for k in range(current_no_coords):
                    results['non_uniform_loads'][load_names[j]]['X'].append(float(lines[load_name_index+k+3].strip().split()[0]))
                    results['non_uniform_loads'][load_names[j]]['Z'].append(float(lines[load_name_index+k+3].strip().split()[1]))

它能完成任务,但真是一场噩梦。如果@Iguananaut有一个更具吸引力的解决方案(特别是使用正则表达式),我将很乐意接受。