Question

我有多个文本文件。每个文件都是动物及其房屋数量的清单。像这样：

houseA.txt

cats 3  
dogs 1  
birds 4

houseB.txt

cats 5  
dogs 3  
birds 1

我有大约20个房子，每个房子有大约16000种（所以每个文件大约有16000条线。所有房屋都有相同的种类，每个种类只有不同的数量。

我当前的脚本逐行遍历每个文件，并捕获房屋，物品名称及其数量。

我想创建一本房屋字典，每个房子都是动物及其数量的字典。因此，从上面的示例中，结果将如下所示：

dictOfDicts{houseA:{'cats': 3, 'dogs': 1, 'birds': 4}, houseB:{'cats': 5, 'dogs': 3, 'birds': 1}}

如果您想知道，稍后会将其转换为表格：

      house:   A   B
animal         
  cats         3   5
  dogs         1   3
 birds         4   1

这是我的剧本：

#!/usr/bin/python3
import sys


houseL = []
dictList = []
with open(sys.argv[1], 'r') as files:
    for f in files:
        f = f.rstrip()
        with open(f, 'r') as aniCounts:
            house = str(aniCounts).split(sep='/')[2]  # this and the next line captures the house name from the file name.
            house = house.split('.')[0]
            houseL.append(house)

            for line in aniCounts:
                ani = line.split()[0]
                count = line.split()[1]
                #print(ani, ' ', count)

编辑：由于有用的评论者，将问题改为dicts dict。

Answer 1

我会尝试这样的事情：

house_names = ['houseA', 'houseB', ...]
houses_dict = {}

for house in house_names:
    houses_dict[house] = {}

    with open(house + '.txt') as f:
        for line in f:
            species, num = line.rsplit(maxsplit=1)  # split off rightmost word
            houses_dict[house][species] = int(num)

结果将是（例如）：

houses_dict = {
    'houseA': {
        'cats': 3
        'dogs': 1
        'birds': 4
    },
    'houseB': {
        'cats': 5
        'dogs': 3
        'birds': 1
    }
    ...
}

Answer 2

还有一个版本：

from path import Path

dir_path = '/TEMP'

files_ls = [x for x in Path(dir_path).files() if 'house' in str(x)]

def read_file(path):
    lines = dict([row.strip().split(' ') for row in path.open(encoding='utf-8')])
    return lines

all_data = dict([(str(x.name),read_file(x)) for x in files_ls])

print(all_data)

输出：

{'house1.txt': {u'birds': u'4', u'cats': u'3', u'dogs': u'1'}}

Answer 3

如果您不想分开自己，请使用csv.DictReader并确保在文件中引用名称中包含空格的动物：

from csv import DictReader

d = {}
files = ["h1.csv","h2.csv"]

for f in files:
  with open(f,"r",encoding="utf8",newline="") as  houseData:
    d[f] = {} # dict per house
    for row in DictReader(houseData, fieldnames=["animal","count"], delimiter=' ' ):
      d[f][row["animal"]] = int(row["count"])  # access by given fieldnames

print(d)

<强>输出：

{'h1.csv': {'cats': 3, 'dogs': 1, 'birds': 4}, 
 'h2.csv': {'cats': 5, 'dogs': 3, 'birds': 1, 'insects': 2402, 'Blue Flutterwings': 2}}

档案 h1.csv

cats 3
dogs 1
birds 4

档案 h2.csv

cats 5
dogs 3
birds 1
insects 2402
"Blue Flutterwings" 2

警告：如果你家里有Green Cantilopes或Blue Flutterwings，你必须在文件中引用它们 - 这就是这个解决方案开始闪耀的地方 - 因为它会自动处理带引号的字符串并结合' '作为分隔符。

在字典中创建多个动态词典

3 个答案: