免责声明:对python非常新鲜。我有一个作业,要求我在删除标点符号并小写文件中的所有单词后计算并打印文件中每个单词的频率(以及单词)。现在我有以下输入组合来处理文件中的每一行:
import string
words = "Dave, Laura, Maddy, Dave, Laura, Maddy, Dave, Laura, Dave"
translation = str.maketrans("","", string.punctuation)
new = words.translate(translation)
lower = new.lower()
然而这对我来说似乎很粗糙,我觉得我可以用更少的函数调用/更少的代码来完成我的任务。有没有人对我如何做到这一点有任何建议?
答案 0 :(得分:1)
words = "Dave, Laura, Maddy, Dave, Laura, Maddy, Dave, Laura, Dave"
words_lower = ' '.join([word.lower() for word in words.split(',')])
print (words_lower)
dave laura maddy dave laura maddy dave laura dave
答案 1 :(得分:0)
如果你想计算每个单词的频率,你可以尝试这个:
>>> from collections import Counter
>>> words = "Dave, Laura, Maddy, Dave, Laura, Maddy, Dave, Laura, Dave"
>>> Counter([word.lower() for word in words.split(', ')])
Counter({'dave': 4, 'laura': 3, 'maddy': 2})
的文档
第一个答案的简短替代方案:
>>> words = "Dave, Laura, Maddy, Dave, Laura, Maddy, Dave, Laura, Dave"
>>> words.replace(',', ' ').lower()
'dave laura maddy dave laura maddy dave laura dave'
如果你想摆脱标点符号(超过','):
>>> import re
>>> words = "Dave! Laura: Maddy; Dave, Laura? Maddy, Dave, Laura, Dave."
>>> re.sub(r'[!:;,?.]', '', words).lower()
'dave laura maddy dave laura maddy dave laura dave'