算频率,每次怎么取两个字?

时间:2015-11-13 02:06:02

标签: python

["this","example"]:1  , ["is","silly"]:1  ....
这样的事情。 我可以处理单个单词的情况,但是如何访问两个元素并让它们成为关键?

with open(sys.argv[1], 'rb') as f:
    word_list = f.read().lower()

    unwanted = ['(', ')', '\\', '"', '\'','.',';',':','!']

    for c in unwanted:
        word_list = word_list.replace(c," ")

    words = word_list.split()

    fdic = {}

    for word in words:

        # form dictionary how can you let key be pair of item?
        fdic[word] = fdic.get(word,0) + 1

5 个答案:

答案 0 :(得分:3)

你可以使用列表理解来获取bigrams列表,迭代原始的单词列表:

bigrams = [word_list[i] + " " + wordlist[i+1] for i in range(len(wordlist)-1)]

答案 1 :(得分:0)

您可以尝试使用以下代码来获取该组。然后使用元组进行计数。

words = ["This","example","is","silly",".","That","example","is","also","silly","."]

for i in range(0,len(words), 2):
    group = None
    if i+1 < len(words):
        group = (words[i], words[i+1])
    else:
        group = (words[i], )
    print group

答案 2 :(得分:0)

你可以将两个单词作为字典上的键,将它们转换为一个可混合类型的元组:

words = ["This","example","is","silly","That","example","is","also","silly"]

fdic = {}

for i in range(len(words)-1):
    word = tuple(words[i:i+2])
    fdic[word] = fdic.get(word,0) + 1
print fdic

答案 3 :(得分:0)

我建议将单词对存储为元组。

from collections import defaultdict

content = 'This example is silly. That example is also silly.'.lower()
unwanted = ['(', ')', '\\', '"', '\'','.',';',':','!']

for c in unwanted:
    content = content.replace(c," ")

words = content.split()

fdic = defaultdict(int)

for idx, word in enumerate(words[1:]):
    pair = (words[idx], words[idx + 1])
    fdic[pair] += 1

结果:

{('is', 'also'): 1,
('example', 'is'): 2,
('also', 'silly'): 1,
('silly', 'that'): 1,
('this', 'example'): 1,
('is', 'silly'): 1,
('that', 'example'): 1}

您不需要使用defaultdict,但它简化了将每个新密钥初始化为零的过程。

答案 4 :(得分:0)

不确定为什么没有人建议使用计数器 - 毕竟来计算事物。

>>> from collections import Counter
>>> words = ["This","example","is","silly","That","example","is","also","silly"]
>>> print(Counter(tuple(words[i: i + 2]) for i in range(len(words) - 1)))
Counter({('example', 'is'): 2, ('This', 'example'): 1, ('is', 'silly'): 1, ('is', 'also'): 1, ('That', 'example'): 1, ('also', 'silly'): 1, ('silly', 'That'): 1})

您可能还希望对单词使用额外的过滤 - 使用小写等。