将文本文件转换为字典

时间:2017-06-06 22:28:34

标签: python python-2.7 python-3.x dictionary

我在网络中有需求的以下文本文件。

Origin  1
    1 :      0.0;     2 :    100.0;     3 :    100.0;     4 :    500.0;     5 :    200.0;
    6 :    300.0;     7 :    500.0;     8 :    800.0;     9 :    500.0;    10 :   1300.0;
   11 :    500.0;    12 :    200.0;    13 :    500.0;    14 :    300.0;    15 :    500.0;
   16 :    500.0;    17 :    400.0;    18 :    100.0;    19 :    300.0;    20 :    300.0;
   21 :    100.0;    22 :    400.0;    23 :    300.0;    24 :    100.0;

Origin  2
    1 :    100.0;     2 :      0.0;     3 :    100.0;     4 :    200.0;     5 :    100.0;
    6 :    400.0;     7 :    200.0;     8 :    400.0;     9 :    200.0;    10 :    600.0;
   11 :    200.0;    12 :    100.0;    13 :    300.0;    14 :    100.0;    15 :    100.0;
   16 :    400.0;    17 :    200.0;    18 :      0.0;    19 :    100.0;    20 :    100.0;
   21 :      0.0;    22 :    100.0;    23 :      0.0;    24 :      0.0;

Origin  3
    1 :    100.0;     2 :    100.0;     3 :      0.0;     4 :    200.0;     5 :    100.0;
    6 :    300.0;     7 :    100.0;     8 :    200.0;     9 :    100.0;    10 :    300.0;
   11 :    300.0;    12 :    200.0;    13 :    100.0;    14 :    100.0;    15 :    100.0;
   16 :    200.0;    17 :    100.0;    18 :      0.0;    19 :      0.0;    20 :      0.0;
   21 :      0.0;    22 :    100.0;    23 :    100.0;    24 :      0.0;

... records 4-23 elided ...

Origin  24
    1 :    100.0;     2 :      0.0;     3 :      0.0;     4 :    200.0;     5 :      0.0;
    6 :    100.0;     7 :    100.0;     8 :    200.0;     9 :    200.0;    10 :    800.0;
   11 :    600.0;    12 :    500.0;    13 :    700.0;    14 :    400.0;    15 :    400.0;
   16 :    300.0;    17 :    300.0;    18 :      0.0;    19 :    100.0;    20 :    400.0;
   21 :    500.0;    22 :   1100.0;    23 :    700.0;    24 :      0.0;

现在我需要创建一个字典,它应该类似于:

{(1,1):0.0, (1,2):100.0, (1, 3):100.0, .......
 (2, 1):100.0, (2,2):0, ......}

元组元素例如(1, 2)代表原点和目的地,值代表需求(100.0密钥为(1, 2)。)

我尝试了以下内容:

with open("trips.txt", "r") as f:
     line = f.readline()
     line = f.readline()
     ind = 0
     while len(line):
         line = line.strip(';')
         l = line.split()
         print l

         ind = ind + 1
         if(ind == 5):
             line = f.readline()
             line = f.readline()
             line = f.readline()
             ind = 0
             node = node + 1
         else:
             line = f.readline()

但是我认为我不会去任何地方......

4 个答案:

答案 0 :(得分:3)

你绝对不会去任何地方,因为你根本没有对字典做过任何参考。

我将在此为您概述一个流程;你能填写详细资料吗?

my_dict = {}

while not EOF:
    # read the "Origin" line
    line = f.readline()

    # extract the number on the right
    origin_num = int( line.split()[-1] )

    # Read the data lines
    for _ in range(5):    # each data chunk has 5 lines
        data_line = readline()
        entries = data_line.split(';')    # split at semicolons

        for field in entries:
            y_key, value = field.split(:)
            # Now, you need to convert both of those to integers,
            #    combine v_key with the origin_num,
            #    and insert that value into my_dict.

这会让你感动吗? 请注意,您还需要处理空行,检测文件结尾等。

答案 1 :(得分:2)

嗯,如果要提取数据,则需要逐行解析,算法大致应该是:

  • 逐行扫描文件
    • if空行,跳过
    • if该行以&Origin;'起始'捕获它之后的数字(origin_no
    • else用分号和每个元素拆分行:
      • 用冒号分割
      • 第一个元素是第二个dict键号(element_no
      • 第二个元素是值(value_no
      • 将您的结果字典存储为(origin_no, element_no): value_no

实施起来非常简单:

result = {}  # we'll store our result in this dict
origin_no = 0  # our starting Origin number in case the file doesn't begin with one
with open("trips.txt", "r") as f:
    for line in f:
        line = line.rstrip()  # we're not interested in the newline at the end
        if not line:  # empty line, skip
            continue
        if line.startswith("Origin"):
            origin_no = int(line[7:].strip())  # grab the integer following Origin
        else:
            elements = line.split(";")  # get our elements by splitting by semi-colon
            for element in elements:  # loop through each of them:
                if not element:  # we're not interested in the last element
                    continue
                element_no, element_value = element.split(":")  # get our pair
                # beware, these two are now most likely padded strings!
                # that's why we'll strip them from whitespace and convert to integer/float
                result[(origin_no, int(element_no.strip()))] = float(element_value.strip())
# Done!

答案 2 :(得分:1)

您可以尝试:

with open('trips.txt', 'r') as f:
    dic = {}
    try:
        while True:
            num = int(f.next().split()[1])
            lst = []
            for _ in xrange(5):
                lst.append(f.next().strip().split(';'))
            f.next()
            for n in lst:
                for l in n:
                    if l != '':
                        tmp = l.strip().split(':')
                        dic[(num, int(tmp[0]))] = float(tmp[1])
    except StopIteration:
        print dic

输出:

{(1, 21): 100.0, (1, 3): 100.0, (2, 18): 0.0, (2, 8): 400.0, (1, 17): 400.0, (2, 1): 100.0, (1, 15): 500.0, (2, 22): 100.0....etc}

答案 3 :(得分:1)

另一种方法 -

nw.usage是一个具有使用内容的文件..

正如我在下面的代码中所评论的那样..如果你想维护插入顺序,请使用collections.OrderedDict()。

希望它有所帮助!

#!/usr/bin/env python

import re
#import collections

with open('nw.usage', 'r') as f:
  usage_dict = {}
  #Use collections.OrderedDict() if you want to maintain insertion order
  origin_val = ''
  for line in f:
    if re.search('Origin', line):
      origin_val = line.rstrip()[-1]
    else:
      hr_demand = line.strip().split(';')
      for hr in hr_demand:
        if not hr:
          continue
        hour = hr.split(':')[0].strip()
        usage = hr.split(':')[1].strip()
        usage_dict[(origin_val, hour)] = usage

  print usage_dict

输出是 -

{('1', '17'): '400.0', ('2', '2'): '0.0', ('2', '17'): '200.0', ('1', '20'): '300.0', ('1', '18'): '100.0', ('2', '20'): '100.0', ('1', '13'): '500.0', ('1', '6'): '300.0', ('2', '13'): '300.0', ('1', '24'): '100.0', ('2', '7'): '200.0', ('2', '24'): '0.0', ('1', '2'): '100.0', ('1', '16'): '500.0', ('2', '3'): '100.0', ('2', '18'): '0.0', ('1', '21'): '100.0', ('2', '23'): '0.0', ('1', '12'): '200.0', ('2', '14'): '100.0', ('2', '8'): '400.0', ('1', '5'): '200.0', ('2', '10'): '600.0', ('2', '4'): '200.0', ('2', '19'): '100.0', ('1', '22'): '400.0', ('1', '1'): '0.0', ('2', '22'): '100.0', ('1', '15'): '500.0', ('2', '15'): '100.0', ('2', '9'): '200.0', ('1', '11'): '500.0', ('1', '4'): '500.0', ('2', '11'): '200.0', ('1', '9'): '500.0', ('2', '5'): '100.0', ('1', '23'): '300.0', ('1', '14'): '300.0', ('2', '1'): '100.0', ('2', '16'): '400.0', ('1', '19'): '300.0', ('2', '21'): '0.0', ('1', '10'): '1300.0', ('1', '7'): '500.0', ('2', '12'): '100.0', ('1', '8'): '800.0', ('2', '6'): '400.0', ('1', '3'): '100.0'}