["this","example"]:1 , ["is","silly"]:1 ....
这样的事情。
我可以处理单个单词的情况,但是如何访问两个元素并让它们成为关键?
with open(sys.argv[1], 'rb') as f:
word_list = f.read().lower()
unwanted = ['(', ')', '\\', '"', '\'','.',';',':','!']
for c in unwanted:
word_list = word_list.replace(c," ")
words = word_list.split()
fdic = {}
for word in words:
# form dictionary how can you let key be pair of item?
fdic[word] = fdic.get(word,0) + 1
答案 0 :(得分:3)
你可以使用列表理解来获取bigrams列表,迭代原始的单词列表:
bigrams = [word_list[i] + " " + wordlist[i+1] for i in range(len(wordlist)-1)]
答案 1 :(得分:0)
您可以尝试使用以下代码来获取该组。然后使用元组进行计数。
words = ["This","example","is","silly",".","That","example","is","also","silly","."]
for i in range(0,len(words), 2):
group = None
if i+1 < len(words):
group = (words[i], words[i+1])
else:
group = (words[i], )
print group
答案 2 :(得分:0)
你可以将两个单词作为字典上的键,将它们转换为一个可混合类型的元组:
words = ["This","example","is","silly","That","example","is","also","silly"]
fdic = {}
for i in range(len(words)-1):
word = tuple(words[i:i+2])
fdic[word] = fdic.get(word,0) + 1
print fdic
答案 3 :(得分:0)
我建议将单词对存储为元组。
from collections import defaultdict
content = 'This example is silly. That example is also silly.'.lower()
unwanted = ['(', ')', '\\', '"', '\'','.',';',':','!']
for c in unwanted:
content = content.replace(c," ")
words = content.split()
fdic = defaultdict(int)
for idx, word in enumerate(words[1:]):
pair = (words[idx], words[idx + 1])
fdic[pair] += 1
结果:
{('is', 'also'): 1,
('example', 'is'): 2,
('also', 'silly'): 1,
('silly', 'that'): 1,
('this', 'example'): 1,
('is', 'silly'): 1,
('that', 'example'): 1}
您不需要使用defaultdict,但它简化了将每个新密钥初始化为零的过程。
答案 4 :(得分:0)
不确定为什么没有人建议使用计数器 - 毕竟是来计算事物。
>>> from collections import Counter
>>> words = ["This","example","is","silly","That","example","is","also","silly"]
>>> print(Counter(tuple(words[i: i + 2]) for i in range(len(words) - 1)))
Counter({('example', 'is'): 2, ('This', 'example'): 1, ('is', 'silly'): 1, ('is', 'also'): 1, ('That', 'example'): 1, ('also', 'silly'): 1, ('silly', 'That'): 1})
您可能还希望对单词使用额外的过滤 - 使用小写等。