我正在尝试获取一个查看文本的计数器,并返回与前一对字母相关的字母的频率。 例如,输出的一部分是:
'th' : Counter ({'e':119, 'a':145 etc... })
我希望它以小写字符迭代所有可能的对。
到目前为止,我一直在使用以下代码来获取仅考虑前一个字母的输出:
def pairwise(iterable):
it = iter(iterable)
last = next(it)
for curr in it:
yield last, curr
last = curr
valid = set('abcdefghijklmnopqrstuvwxyz ')
def valid_pair((last, curr)):
return last in valid and curr in valid
def make_markov(text):
markov = defaultdict(Counter)
lowercased = (c.lower() for c in text)
for p, q in ifilter(valid_pair, pairwise(lowercased)):
markov[p][q] += 1
return markov
答案 0 :(得分:1)
未测试:
def pairwise(iterable):
it = iter(iterable)
last = next(it)+next(it)
for curr in it:
yield last, curr
last = last[1]+curr
def valid_pair((last, curr)):
return last[0] in valid and last[1] in valid and curr in valid