任何人都可以帮我解决这个问题吗?我想计算文本文件中的不同类型。
import sys
import re
import string
pattern = re.compile("^[a-z][a-z0-9]*$")
with open('alice.txt','r') as f:
for line in f:
for word in line.split():
lword = word.lower()
if pattern.match(lword):
if len(lword) >= 10:
print "Extralong:",'%s%s%d' % (lword, "\t", 1)
elif len(lword) in [7, 8, 9] :
print "Long:",'%s%s%d' % (lword, "\t", 1)
elif len(lword) in [5, 6] :
print "Medium:",'%s%s%d' % (lword, "\t", 1)
elif len(lword) in [1] and lword in "aeiou":
print "Vowel",'%s%s%d' % (lword, "\t", 1)
else :
print "Small:"'%s%s%d' % (lword, "\t", 1)
输出:
Small:the 1
Long: project 1
Long: gutenberg 1
Medium: ebook 1
Small:of 1
Medium: alice 1
Small:in 1
Small:by 1
Medium: lewis 1
Long: carroll 1
Small:this 1
Medium: ebook 1
Small:is 1
Small:for 1
Small:the 1
Small:use 1
我想获得每个人的总金额,例如Small:5,Long:3,Medium:3 ...
答案 0 :(得分:0)
我会计算所有然后合并但另一个替代方法是bisect
以使用每个组的最高值作为关键来查看长度将落在何处:
from collections import defaultdict
from bisect import bisect_left
with open("in.txt") as f:
keys = [1, 4, 6, 9]
for ln in map(len, map(str.split, f)):
ind = bisect_left(keys, ln)
# if ln is between (1-9), ind will be between 0 and 3
if ind < len(keys):
d[keys[ind]] += 1
print(d)
每次我们一分为二,我们都会在排序列表中找到长度所在的位置:
In [13]: keys = [1, 4, 6, 9]
In [14]: bisect_left(keys, 1)
Out[14]: 0
# range 2-4
In [15]: bisect_left(keys, 3)
Out[15]: 1
# range 2-4
In [16]: bisect_left(keys, 4)
Out[16]: 1
# range 5-6
In [17]: bisect_left(keys, 5)
Out[17]: 2
# range 7-9
In [18]: bisect_left(keys, 7)
Out[18]: 3
# range 7-9
In [19]: bisect_left(keys, 9)
Out[19]: 3
# > 9
In [20]: bisect_left(keys, 10)
Out[20]: 4
逻辑有点类似于bisect docs中的grade示例函数:
def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
i = bisect(breakpoints, score)
return grades[i]
答案 1 :(得分:0)
在python 2/3中,集合模块中的Counter可以帮助计算每个项目:
import re
from collections import Counter
words = []
pattern = re.compile("^[a-z][a-z0-9]*$")
with open('alice.txt','r') as f:
for line in f:
for word in line.split():
lword = word.lower()
if pattern.match(lword):
if len(lword) >= 10:
words.append("Extralong")
elif len(lword) in [7, 8, 9] :
words.append("Long")
elif len(lword) in [5, 6] :
words.append("Medium")
elif len(lword) in [2, 3, 4] :
words.append("Small")
elif len(lword) == 1 and lword in "aeiou":
words.append("Vowel")
else: # lword is 0
words.append("Nothing")
print dict(Counter(words))
考虑以下因素:
&#34;没有什么&#34;不会发生因为正则表达式匹配非空单词;
无需word.lower()
,因为正则表达式只匹配小写字母和数字。
简化代码可以是:
from re import match
from collections import Counter
with open('alice.txt','r') as f:
words = [(len(word) >= 10 and 'Extralong') or (len(word) >= 7 and 'Long') or \
(len(word) >= 5 and 'Medium') or (len(word) >= 2 and 'Small') or \
(word in 'aeiou' and 'Vowel') for word in f.read().split() if match(r'^[a-z][a-z0-9]*$', word) ]
print dict (Counter(words))
输出结果为:
{'Small': 9, 'Medium': 4, 'Long': 3}