Question

为.txt文件制作字典的最简单方法是什么？文本文件中的每个单词都用空格分隔。文件中的每个单词都应该是一个键（在字典中），其值是文件中某个点后面的所有单词，包括重复。

所以如果文本文件是：我喜欢猫和狗。狗喜欢猫。我更喜欢狗。

字典将是：

d = {'I': ['like', 'like'], 'like': ['cats', 'cats', 'dogs'], 'cats': ['and', '. ']...

......直到所有单词成为键。

编辑：对不起，我没有显示我到目前为止的代码，因为我是一个极端的初学者，几乎不知道我在做什么。而且，它看起来很糟糕。但是，这里有一些：

def textDictionary(fileName):
    p = open(fileName)
    f = p.read()
    w = f.split()
    newDictionary = {}
    for i in range(len(w)):
        newDictionary[w[i]] = w[i+1]
    return newDictionary

现在这当然不应该做我想做的一切，但至少不应该回归：

{'我'：'喜欢'，'喜欢'：'猫'，'猫'：'和'...}

......等等？

但它给了我一些完全不同的东西。

Answer 1

对我来说，这似乎是defaultdict的工作。首先你需要决定如何分割单词 - 为简单起见，我只是分成空格，但这可能是正则表达式的工作，因为你有标点符号：

from collections import defaultdict
d = defaultdict(list)

with open('textfile') as fin:
    data = fin.read()
    words = data.split()

for i, w in words:
    try:
        d[w].append(words[i+1])
    except IndexError:
        pass  # last word has no words which follow it...

Answer 2

最好的方法是迭代两个并发循环中的单词，偏移一个。为此，请在原始列表和列表zip上使用[1:]。

这次迭代将是你对dict的关键和价值。或者更确切地说，在这种情况下，defaultdict。使用defaultdict创建的list会自动使用空列表初始化每个键。因此，您可以根据需要append而无需设置初始值

from collections import defaultdict

def textDictionary(fileName):
    with open(fileName) as p:  # with to open and automatically close
        f = p.read()
        w = f.split()

    newDictionary = defaultdict(list)
    # defaultdict initialized with list makes each element a list automatically,
    # this is great for `append`ing

    for key, value in zip(w, w[1:]):
        newDictionary[key].append(value)  # easy append!

    return dict(newDictionary)  # dict() changes defaultdict to normal

文件：

我喜欢像猫一样的猫狗

返回：

{'I': ['like'], 'and': ['dogs'], 'cats': ['and'], 'like': ['cats', 'cats'], 'dogs': ['like']}

我注意到在这种情况下like后跟cats两次。如果您只想要一个，请使用defaultdict而不是set初始化list，并使用.add代替.append

Documentation on zip
Documentation on defaultdict

Answer 3

从文件中读取该行后，您可以这样做：

line = 'I like cats and dogs. Dogs like cats. I like dogs more.'
line = line.replace('.', ' .') #To make sure 'dogs.' or 'cats.' do not become the keys of the dictionary.
op = defaultdict(list)
words = line.split()
for i, word in enumerate(words):
    if word not in '.': #To make sure '.' is not a key in the dictionary
        try:
            op[word].append(words[i+1])
        except IndexError:
            pass

你唯一需要明确处理的是完全停止。评论解释了代码如何实现这一点。上面的代码导致：

{'and': ['dogs'], 'like': ['cats', 'cats', 'dogs'], 'I': ['like', 'like'], 'dogs': ['.', 'more'], 'cats': ['and', '.'], 'Dogs': ['like'], 'more': ['.']}

从文本文件制作综合字典？

3 个答案: