Question

我有一个文本文件，其中包含有关餐馆的信息，以及将此信息插入多个词典所需的信息。属性是名称，评级，价格范围，菜肴类型

这里是txt的内容

Georgie Porgie 
87% 
$$$ 
Canadian,Pub Food

Queen St. Cafe 
82% 
$ 
Malaysian,Thai

到目前为止，我已阅读该文件并将内容抓取到列表中。

content = [];
with open(file) as f:
        content = f.readlines();
        content = [x.strip() for x in content];

需要插入三个词典 names_rating，price_names，cuisine_names我怎么去呢？

Answer 1

通常，要从列表lists_of_dicts列表构建词典list_of_lists列表，您可以将索引i处的项目映射到索引{{ {1}}，你会像这样使用dict comp：

您应该可以将此应用于任意list_of_dicts = {lst[i]: lst[j] for lst in list_of_lists}以解决您的问题。

Answer 2

查看您提供的文件示例，元素是空格分隔的。

所以，你的任务是：

打开文件
阅读每一行
拆分空格
将条目保存在词典中

这将按如下方式完成：

names_rating = {}
price_names = {}
cuisine_names = {}
with open(file) as f:
    lines = []
    for line in f:
        content = f.readline().rstrip()
        if content != ''
            lines.append(content)
        if len(lines) > 4 :
            name = lines[0]
            rating = lines[1]
            price = lines[2]
            cuisine = lines[3].split(',')
            names_rating[name] = rating
            price_names[name] = price
            cuisine_name[name] = cuisine
            lines = []

在此，逐行读取文件，结果附加在列表lines中。当列表大小超过4时，所有属性都将读入列表中。然后处理它们以将数据保存在词典中。然后清空列表以再次执行该过程。

Answer 3

根据文本文件的最新格式规范：

Georgie Porgie 
87% 
$$$ 
Canadian,Pub Food

Queen St. Cafe 
82% 
$ 
Malaysian,Thai

如果你可以认为：

每个餐厅条目总是由四行定义，每行包含您所追求的字段（读取：字典条目）
字段将始终以相同的顺序显示
每个条目将始终通过空行分隔下一个条目

然后您可以使用modulo operation并执行以下操作：

import re

content = {}
filepath = 'restaurants_new.txt'
with open(filepath, 'r') as f:
    fields = ['name', 'rating', 'price', 'cuisine']
    name = ''
    for i, line in enumerate(f):
        modulo = i % 5
        raw = line.strip()
        if modulo == 0:
            name = raw
            content[name] = {}
        elif modulo < 4:
             content[name][fields[modulo]] = raw
        elif modulo == 4:
            # we gathered all the required info; reset
            name = ''

from pprint import pformat
print pformat(content)

编辑：在您最初发布的格式之后提出了以下解决方案，如下所示：

Georgie Porgie 87% $$$ Canadian,Pub Food
Queen St. Cafe 82% $ Malaysian,Thai

我在这里留下原来的答案，以防它对其他人有用。

作为JohanL mentioned in his comment，解决问题的最简单方法是行格式化：取决于您是将逗号或空格作为分隔符，还是两者的组合，并考虑到餐馆的名称可以包含联合国未知的单词数量，找到如何拆分行可能会变得棘手。

这与@gaurav建议的方法略有不同，使用regular expressions（re模块）：

import re

content = {}
filepath = 'restaurants.txt'
dictmatch = r'([\s\S]+) ([0-9]{1,3}\%) (\$+) ([\s\S]+)'
with open(filepath, 'r') as f:
    for line in f:
        raw = line.strip()
        match = re.match(dictmatch, raw)
        if not match:
            print 'no match found; line skipped: "%s"' % (raw, )
            continue
        name = match.group(1)
        if name in content:
            print 'duplicate entry found; line skipped: "%s"' % (raw, )
            continue
        content[name] = {
            "rating": match.group(2),
            "price": match.group(3),
            "cuisine": match.group(4) 
        }

from pprint import pformat
print pformat(content)

假设您无法控制源txt，此方法的优点是您可以定制正则表达式模式以匹配它带来的任何“不理想”格式。

读取文件并将内容插入词典

3 个答案: