Bigram Counter几乎完整的Python

时间:2014-11-14 09:51:58

标签: python loops dictionary printing

我有一个几乎功能齐全的二元车柜台,但我坚持两件事:

a)正确打印键和值,因为我的键是元组

b)循环代码以接受新的输入行

到目前为止,我已经:

bigrams = {}

line = input ('Line: ').split()
while len(line) > 1:                    
  bigram_key = tuple(line[0:2])       
  if bigram_key not in bigrams:      
    bigrams[bigram_key] = 1         
  else:                               
    bigrams[bigram_key] += 1         
  line = line[1:]                     

for entry in bigrams.keys():
  print (entry,":",bigrams[entry])

哪一项适用于单行输入,虽然打印了我不想要的额外的gubbins(技术术语):

Line: The Big The Big Red Fox
('Big', 'The') : 1
('Big', 'Red') : 1
('Red', 'Fox') : 1
('The', 'Big') : 2

当我在之后:

Line: The Big The Big Red Fox
Big The: 1
Big Red: 1
Red Fox: 1
The Big: 2

然后我需要它来处理多行输入!

2 个答案:

答案 0 :(得分:1)

第一个问题:

>>> for i in bigrams:
...     print ' '.join(i),':',bigrams[i]
... 
Big The : 1
Red Fox : 1
Big Red : 1
The Big : 2

对于你的第二个问题:

>>> bigrams={}
>>> while True:
...     print "Enter some text or enter `break` keyword to stop:"
...     line = raw_input()
...     if line.lower() == 'break': break
...     line = line.split()
...     for i,j in zip(line[:-1],line[1:]):   # Keep taking two consecutive words (bigrams) until end of line
...         bigrams.setdefault((i,j),0)
...         bigrams[(i,j)]+=1

答案 1 :(得分:0)

这是我制作的最终代码。感谢您的支持!

bigrams = {}
while True:
  line = input ('Line: ').lower().split()
  while len(line) > 1:                    
    bigram_key = tuple(line[0:2])       
    if bigram_key not in bigrams:      
      bigrams[bigram_key] = 1         
    else:                               
      bigrams[bigram_key] += 1         
    line = line[1:] 
  if line == []:
    break
for entry in bigrams:
  if bigrams[entry] > 1:
    print(' '.join(entry)+':',bigrams[entry])