Python:通过删除文本处理中的重复项来创建集合?

时间:2016-02-11 13:45:24

标签: python python-3.x text set duplicates

假设一个文本文件包含两列,如下所示

A "
A "
A l
A "
C r
C "
C l
D a
D "
D "
D "
D d
R "
R "
R "
R " 
S "
S "
S o
D g
D "
D "
D "
D j
A "
A "
A z

我想检索下面的信息

list1= {A:l}, {C:r,l}, {D:a,d}, {S:o}
final_list= {A:l}, {C:r,l}, {D:a,d}, R{}, {S:o}

我理解,我必须访问文本文件line.strip().split()

之后我不知道如何继续。

2 个答案:

答案 0 :(得分:1)

import collections
list1 = collections.defaultdict(set)
final_list = collections.defaultdict(set)
for line in filetext: ## assuming youve opened it, read it in
    key, value = line.strip().split()
    final_list[key].add(value)
    if value != '"':
        list1[key].add(value)

这略有不同,因为final_list将空字符串作为元素;这与你所说的并不匹配,所以让我们改变它:

import collections
list1 = collections.defaultdict(set)
final_list = {}
for line in filetext: ## assuming youve opened it, read it in
    key, value = line.strip().split()
    if key not in final_list:
        final_list[key] = set()
    if value != '"':
        list1[key].add(value)
final_list.update(list1)

这应该可以为你提供你想要的东西 - 存在空集,例如R

答案 1 :(得分:1)

如果final_list 中的词组顺序不重要

from collections import defaultdict

with open('/home/bwh1te/projects/stackanswers/wordcount/data.txt') as f:
    occurencies = defaultdict(list)
    for line in f:
        key, value = line.strip().split()
        # invoke of occurencies[key] in this condition
        # cause autocreating of this key in dict
        if value not in occurencies[key] and value.isalpha(): 
            occurencies[key].append(value)

# defaultdict(<class 'list'>, {'C': ['r', 'l'], 'D': ['a', 'd'], 'S': ['o'], 'A': ['l'], 'R': []})
# Use it like a simple dictionary

# In case if it must be a list, not a dict:
final_list = [{key: value} for key, value in occurencies.items()]
# [{'C': ['r', 'l']}, {'D': ['a', 'd']}, {'S': ['o']}, {'A': ['l']}, {'R': []}]

如果final_list 中的词组顺序很重要:

from collections import OrderedDict

with open(file_path) as f:
    occurencies = OrderedDict()
    for line in f:
        key, value = line.strip().split()
        # Create each key anyway
        if key not in occurencies:
            occurencies[key] = []        
        if value.isalpha():
            if value not in occurencies[key]:
                occurencies[key].append(value)

# OrderedDict([('A', ['l']), ('C', ['r', 'l']), ('D', ['a', 'd']), ('R', []), ('S', ['o'])])

# In case if it must be a list, not a dict
final_list = [{key: value} for key, value in occurencies.items()]
# [{'A': ['l']}, {'C': ['r', 'l']}, {'D': ['a', 'd']}, {'R': []}, {'S': ['o']}]

list1 = [{key: value} for key, value in occurencies.items() if value]
# [{'A': ['l']}, {'C': ['r', 'l']}, {'D': ['a', 'd']}, {'S': ['o']}]

或者您可以实现OrderedDict和defauldict的混合:Can I do an ordered, default dict in Python?:)