Question

我有一个文本文件，其中包含多个行，这些行按名称，位置，网站的顺序排列，然后是＆＃39; END＆＃39;表示一个人的个人资料的结尾，然后再说明姓名，地点，网站等。我需要将名称添加为字典的键，其余（位置，网站）作为其值。

所以，如果我有一个文件：

name1
location1
website1
END
name2
location2
website2
END
name3
location3
website3
END

结果将是：

dict = {'name1': ['location1','website1'],
        'name2': ['location2', 'website2'], 
        'name3': ['location3', 'website3']}

编辑：该值将是一个列表，对不起

我不知道如何处理这个问题，有人能指出我正确的方向吗？

Answer 1

首先，似乎对dictionary的结构存在误解，或者更一般地说，associative containers一般来说，这个问题背后存在误解。

dict的结构是类似python的语法

{
   key : whatever_value1,
   another_key: whatever_value2,
   # ...
}

其次，如果您从

修剪尾随数字

name1
location1
website1

你自然会得到类似于结构的ADT，用于该文件的END分隔的单个条目，即

class Whatever(object):
    def __init__(self, name, location, website):
        self.name = name
        self.location = location
        self.website = website

（您的里程会因班级名称而有所不同）

因此，您可以使用的是一个python dict，它将一个键 - 可能是您记录的name属性 - 映射到（引用）该类型的实例。

要处理输入文件，您每次都会直接读取文件，直到遇到END，然后使用（例如）class Whatever将name提交到字典。作为关键。

Answer 2

使用事件"END"分隔每个部分，itertools.groupby将使用END拆分文件，我们只需要在迭代groupby对象时创建键/值配对。< / p>

from itertools import groupby
from collections import OrderedDict

with open("test.txt") as f:
    d = OrderedDict((next(v), list(v))
             for k, v in groupby(map(str.rstrip, f), key=lambda x: x[:3] != "END") if k)

输出：

   OrderedDict([('name1', ['location1', 'website1']),
  ('name2', ['location2', 'website2']),
  ('name3', ['location3', 'website3'])])

或者使用常规for循环，只需在每次点击END时更改密钥，为tmp列表中的每个部分存储行：

from collections import OrderedDict

with open("test.txt") as f:
    # itertools.imap for python2
    data = map(str.rstrip, f)
    d, tmp, k = OrderedDict(), [], next(data)
    for line in data:
        if line == "END":
            d[k] = tmp
            k, tmp = next(data, ""), []
        else:
            tmp.append(line)

输出将是相同的：

 OrderedDict([('name1', ['location1', 'website1']),
 ('name2', ['location2', 'website2']), 
('name3', ['location3', 'website3'])])

这两个代码示例都适用于任何长度的部分而不仅仅是三行。

Answer 3

已经回答了，但你可以通过应用Python自己的词典和列表理解来缩短内容：

with open(file, 'r') as f:
    triplets = [data.strip().split('\n') for data in f.read().strip().split('END') if data]
    d = {name: [line, site] for name, line, site in triplets}

Answer 4

您可以从文件中一次分割四行，而无需将其全部加载到内存中。一种方法是使用itertools中的islice。

from itertools import islice
data = dict()
with open('file.path') as input:
    while True:
        batch = tuple(x.strip() for x in islice(input, 4))
        if not batch:
            break;
        name, location, website, end = batch
        data[name] = (location, website)

验证

> from pprint import pprint
> pprint(data)

{'name1': ('location1', 'website1'),
 'name2': ('location2', 'website2'),
 'name3': ('location3', 'website3')}

Answer 5

如果您保证始终以此格式获取此数据，则可以执行以下操作：

dict = {}
name = None
location = None
website = None
count = 0:
with open(file, 'r') as f:  #where file is the file name
    for each in f:
    count += 1
    if count == 1:
        name = each
    elif count == 2:
        location = each
    elif count == 3:
        website = each
    elif count == 4 and each == 'END':
       count = 0  # Forgot to reset to 0 when it got to four... my bad.
       dict[name] = (location, website)  # Adding to the dictionary as a tuple since you need to have key -> value not key -> value1, value2
    else:
       print("Well, something went amiss %i  %s" % count, each)

根据特定模式解析文件结构

5 个答案: