从corpus1,corpus2和corpus3语料库的每个句子中抽取一个样本,并显示平均长度(以句子中的字符数来衡量)。
所以我有3个语料,并且sample_raw_sents是一个定义函数,用于返回随机句子:
tcr = corpus1()
rcr = corpus2()
mcr = corpus3()
sample_size=50
for sentence in tcr.sample_raw_sents(sample_size):
print(len(sentence))
for sentence in rcr.sample_raw_sents(sample_size):
print(len(sentence))
for sentence in mcr.sample_raw_sents(sample_size):
print(len(sentence))
因此使用此代码可以打印所有长度,尽管我如何对这些长度进行sum()?
答案 0 :(得分:1)
使用zip
,它将使您可以一次从每个语料库中提取一个句子。
tcr = corpus1()
rcr = corpus2()
mcr = corpus3()
sample_size=50
zipped = zip(tcr.sample_raw_sents(sample_size),
rcr.sample_raw_sents(sample_size),
mcr.sample_raw_sents(sample_size))
for s1, s2, s3 in zipped:
summed = len(s1) + len(s2) + len(s3)
average = summed/3
print(summed, average)
答案 1 :(得分:0)
您可以将sentences
的所有长度存储在list
中,然后对其求和。
tcr = corpus1()
rcr = corpus2()
mcr = corpus3()
sample_size=50
lengths = []
for sentence in tcr.sample_raw_sents(sample_size):
lengths.append(len(sentence))
for sentence in rcr.sample_raw_sents(sample_size):
lengths.append(len(sentence))
for sentence in mcr.sample_raw_sents(sample_size):
lengths.append(len(sentence))
print(sum(lengths) / len(lengths))
答案 2 :(得分:-1)
tcr = corpus1()
rcr = corpus2()
mcr = corpus3()
sample_size=50
s = 0
for sentence in tcr.sample_raw_sents(sample_size):
s = s + len(sentence)
for sentence in rcr.sample_raw_sents(sample_size):
s = s + len(sentence)
for sentence in mcr.sample_raw_sents(sample_size):
s = s + len(sentence)
average = s/150
print('average: {}'.format(average))