Question

我被困住了，需要一点指导。我正在努力使用Grok Learning自己学习Python。下面是问题和示例输出以及我在代码中的位置。我感谢任何可以帮助我解决这个问题的提示。

在语言学中，二元语是句子中的一对相邻词。句子“大红球。”有三首大字号：大，大   红色和红色的球。

编写程序以读入来自用户的多行输入，其中每一行是以空格分隔的单词句子。你的计划   然后应该计算每个双字母组合出现的次数   所有输入句子。应该在案件中对待双子座   通过将输入行转换为小写来实现不敏感的方式。一旦   用户停止输入输入，您的程序应打印出每个   不止一次出现的双胞胎，以及相应的   频率。例如：
Line: The big red ball
Line: The big red ball is near the big red box
Line: I am near the box
Line: 
near the: 2
red ball: 2
the big: 3
big red: 3

我的代码并没有太远，我真的被卡住了。但这就是我所在的地方：

words = set()
line = input("Line: ")
while line != '':
  words.add(line)
  line = input("Line: ")

我甚至做得对吗？尽量不要导入任何模块，只需使用内置功能。</ p>

谢谢，杰夫

Answer 1

让我们从接收句子（带标点符号）的函数开始，并返回找到的所有小写双字母的列表。

因此，我们首先需要从句子中删除所有非字母数字，将所有字母转换为小写字母，然后将空格分成一个单词列表：

import re

def bigrams(sentence):
    text = re.sub('\W', ' ', sentence.lower())
    words = text.split()
    return zip(words, words[1:])

我们将使用标准（内置）re包进行基于正则表达式的非字母数字替换空格，以及内置zip函数来配对连续单词。（我们将单词列表与相同的列表配对，但是移动了一个元素。）

现在我们可以测试一下：

>>> bigrams("The big red ball")
[('the', 'big'), ('big', 'red'), ('red', 'ball')]
>>> bigrams("THE big, red, ball.")
[('the', 'big'), ('big', 'red'), ('red', 'ball')]
>>> bigrams(" THE  big,red,ball!!?")
[('the', 'big'), ('big', 'red'), ('red', 'ball')]

接下来，为了计算每个句子中的双字母组合，您可以使用collections.Counter。

例如，像这样：

from collections import Counter

counts = Counter()
for line in ["The big red ball", "The big red ball is near the big red box", "I am near the box"]:
    counts.update(bigrams(line))

我们得到：

>>> Counter({('the', 'big'): 3, ('big', 'red'): 3, ('red', 'ball'): 2, ('near', 'the'): 2, ('red', 'box'): 1, ('i', 'am'): 1, ('the', 'box'): 1, ('ball', 'is'): 1, ('am', 'near'): 1, ('is', 'near'): 1})

现在我们只需要打印出现不止一次的那些：

for bigr, cnt in counts.items():
    if cnt > 1:
        print("{0[0]} {0[1]}: {1}".format(bigr, cnt))

全部放在一起，带有用户输入循环，而不是固定列表：

import re
from collections import Counter

def bigrams(sentence):
    text = re.sub('\W', ' ', sentence.lower())
    words = text.split()
    return zip(words, words[1:])

counts = Counter()
while True:
    line = input("Line: ")
    if not line:
        break
    counts.update(bigrams(line))

for bigr, cnt in counts.items():
    if cnt > 1:
        print("{0[0]} {0[1]}: {1}".format(bigr, cnt))

输出：

Line: The big red ball
Line: The big red ball is near the big red box
Line: I am near the box
Line: 
near the: 2
red ball: 2
big red: 3
the big: 3

Answer 2

usr_input = "Here is a sentence without multiple bigrams. Without multiple bigrams, we cannot test a sentence."

def get_bigrams(word_string):
    words = [word.lower().strip(',.') for word in word_string.split(" ")]
    pairs = ["{} {}".format(w, words[i+1]) for i, w in enumerate(words) if i < len(words) - 1]
    bigrams = {}

    for bg in pairs:
        if bg not in bigrams:
            bigrams[bg] = 0
        bigrams[bg] += 1
    return bigrams

print(get_bigrams(usr_input))

Answer 3

仅使用OP提到的从Grok学习Python课程的以前的模块中学习到的知识，此代码可以很好地执行所需的操作：

counts = {} # this creates a dictionary for the bigrams and the tally for each one
n = 2
a = input('Line: ').lower().split() # the input is converted into lowercase, then split into a list
while a:
  for x in range(n, len(a)+1):
    b = tuple(a[x-2:x]) # the input gets sliced into pairs of two words (bigrams)
    counts[b] = counts.get(b,0) + 1 # adding the bigrams as keys to the dictionary, with their count value set to 1 initially, then increased by 1 thereafter
  a = input('Line: ').lower().split()  
for c in counts:
  if counts[c] > 1: # tests if the bigram occurs more than once
    print(' '.join(c) + ':', counts[c]) # prints the bigram (making sure to convert the key from a tuple into a string), with the count next to it

注意：您可能需要向右滚动以完整查看有关代码的注释。

这非常简单，不需要导入任何内容，等等。我意识到我已经很晚了，但是希望其他所有从事相同课程/遇到类似问题的人也会发现这个答案很有帮助。

从python 3中的用户输入计算bigrams？

3 个答案: