Question

假设一个文本文件包含两列，如下所示

A "
A "
A l
A "
C r
C "
C l
D a
D "
D "
D "
D d
R "
R "
R "
R " 
S "
S "
S o
D g
D "
D "
D "
D j
A "
A "
A z

我想检索下面的信息

list1= {A:l}, {C:r,l}, {D:a,d}, {S:o}
final_list= {A:l}, {C:r,l}, {D:a,d}, R{}, {S:o}

我理解，我必须访问文本文件line.strip().split()

之后我不知道如何继续。

Answer 1

import collections
list1 = collections.defaultdict(set)
final_list = collections.defaultdict(set)
for line in filetext: ## assuming youve opened it, read it in
    key, value = line.strip().split()
    final_list[key].add(value)
    if value != '"':
        list1[key].add(value)

这略有不同，因为final_list将空字符串作为元素;这与你所说的并不匹配，所以让我们改变它：

import collections
list1 = collections.defaultdict(set)
final_list = {}
for line in filetext: ## assuming youve opened it, read it in
    key, value = line.strip().split()
    if key not in final_list:
        final_list[key] = set()
    if value != '"':
        list1[key].add(value)
final_list.update(list1)

这应该可以为你提供你想要的东西 - 存在空集，例如R。

Answer 2

如果final_list 中的词组顺序不重要

：

from collections import defaultdict with open('/home/bwh1te/projects/stackanswers/wordcount/data.txt') as f: occurencies = defaultdict(list) for line in f: key, value = line.strip().split() # invoke of occurencies[key] in this condition # cause autocreating of this key in dict if value not in occurencies[key] and value.isalpha(): occurencies[key].append(value) # defaultdict(<class 'list'>, {'C': ['r', 'l'], 'D': ['a', 'd'], 'S': ['o'], 'A': ['l'], 'R': []}) # Use it like a simple dictionary # In case if it must be a list, not a dict: final_list = [{key: value} for key, value in occurencies.items()] # [{'C': ['r', 'l']}, {'D': ['a', 'd']}, {'S': ['o']}, {'A': ['l']}, {'R': []}]

如果final_list 中的词组顺序很重要：

from collections import OrderedDict with open(file_path) as f: occurencies = OrderedDict() for line in f: key, value = line.strip().split() # Create each key anyway if key not in occurencies: occurencies[key] = [] if value.isalpha(): if value not in occurencies[key]: occurencies[key].append(value) # OrderedDict([('A', ['l']), ('C', ['r', 'l']), ('D', ['a', 'd']), ('R', []), ('S', ['o'])]) # In case if it must be a list, not a dict final_list = [{key: value} for key, value in occurencies.items()] # [{'A': ['l']}, {'C': ['r', 'l']}, {'D': ['a', 'd']}, {'R': []}, {'S': ['o']}] list1 = [{key: value} for key, value in occurencies.items() if value] # [{'A': ['l']}, {'C': ['r', 'l']}, {'D': ['a', 'd']}, {'S': ['o']}]

或者您可以实现OrderedDict和defauldict的混合：Can I do an ordered, default dict in Python?：）

Python：通过删除文本处理中的重复项来创建集合？

2 个答案: