我想建立一个缩写词典。
我有一个包含大量遗产的文本文件。文本文件如下所示(导入后)
servlet
摘录:
with open('abreviations.txt') as ab:
ab_words = ab.read().splitlines()
现在我想建立一个字典,我将每个不平行的行作为字典键,每个偶数行作为字典值。
因此我应该能够写到最后:
'ACE',
'Access Control Entry',
'ACK',
'Acknowledgement',
'ACORN',
'A Completely Obsessive Really Nutty person',
并得到结果:
ab_dict['ACE']
另外,我怎样才能使它不区分大小写?
'Access Control Entry'
应该产生相同的结果
ab_dict['ace']
事实上,如果输出也是小写的话,那将是完美的:
'Access Control Entry'
以下是文本文件的链接:https://www.dropbox.com/s/91afgnupk686p9y/abreviations.txt?dl=0
答案 0 :(得分:4)
使用自定义ABDict
类和 Python 的生成器功能的完整解决方案:
class ABDict(dict):
''' Class representing a dictionary of abbreviations'''
def __getitem__(self, key):
v = dict.__getitem__(self, key.upper())
return v.lower() if key.islower() else v
with open('abbreviations.txt') as ab:
ab_dict = ABDict()
while True:
try:
k = next(ab).strip() # `key` line
v = next(ab).strip() # `value` line
ab_dict[k] = v
except StopIteration:
break
现在,测试(使用案例相关访问权限):
print(ab_dict['ACE'])
print(ab_dict['ace'])
print('*' * 10)
print(ab_dict['WYTB'])
print(ab_dict['wytb'])
输出(连续):
Access Control Entry
access control entry
**********
Wish You The Best
wish you the best
答案 1 :(得分:1)
以下是基于this solution的pairwise
函数的另一种解决方案:
from requests.structures import CaseInsensitiveDict
def pairwise(iterable):
"s -> (s0, s1), (s2, s3), (s4, s5), ..."
a = iter(iterable)
return zip(a, a)
with open('abreviations.txt') as reader:
abr_dict = CaseInsensitiveDict()
for abr, full in pairwise(reader):
abr_dict[abr.strip()] = full.strip()
答案 2 :(得分:1)
这是一个答案,也允许用词典中的单词替换句子:
import re
from requests.structures import CaseInsensitiveDict
def read_file_dict(filename):
"""
Reads file data into CaseInsensitiveDict
"""
# lists for keys and values
keys = []
values = []
# case sensitive dict
data = CaseInsensitiveDict()
# count used for deciding which line we're on
count = 1
with open(filename) as file:
temp = file.read().splitlines()
for line in temp:
# if the line count is even, a value is being read
if count % 2 == 0:
values.append(line)
# otherwise, a key is being read
else:
keys.append(line)
count += 1
# Add to dictionary
# perhaps some error checking here would be good
for key, value in zip(keys, values):
data[key] = value
return data
def replace_word(ab_dict, sentence):
"""
Replaces sentence with words found in dictionary
"""
# not necessarily words, but you get the idea
words = re.findall(r"[\w']+|[.,!?; ]", sentence)
new_words = []
for word in words:
# if word is in dictionary, replace it and add it to resulting list
if word in ab_dict:
new_words.append(ab_dict[word])
# otherwise add it as normally
else:
new_words.append(word)
# return sentence with replaced words
return "".join(x for x in new_words)
def main():
ab_dict = read_file_dict("abreviations.txt")
print(ab_dict)
print(ab_dict['ACE'])
print(ab_dict['Ace'])
print(ab_dict['ace'])
print(replace_word(ab_dict, "The ACE is not easy to understand"))
if __name__ == '__main__':
main()
哪个输出:
{'ACE': 'Access Control Entry', 'ACK': 'Acknowledgement', 'ACORN': 'A Completely Obsessive Really Nutty person'}
Access Control Entry
Access Control Entry
Access Control Entry
The Access Control Entry is not easy to understand