编辑字符串,Python

时间:2015-11-11 02:14:17

标签: python string

免责声明:对python非常新鲜。我有一个作业,要求我在删除标点符号并小写文件中的所有单词后计算并打印文件中每个单词的频率(以及单词)。现在我有以下输入组合来处理文件中的每一行:

import string

words = "Dave, Laura, Maddy, Dave, Laura, Maddy, Dave, Laura, Dave"
translation = str.maketrans("","", string.punctuation)
new = words.translate(translation)
lower = new.lower()

然而这对我来说似乎很粗糙,我觉得我可以用更少的函数调用/更少的代码来完成我的任务。有没有人对我如何做到这一点有任何建议?

2 个答案:

答案 0 :(得分:1)

words = "Dave, Laura, Maddy, Dave, Laura, Maddy, Dave, Laura, Dave"
words_lower = ' '.join([word.lower() for word in words.split(',')])
print (words_lower)


dave  laura  maddy  dave  laura  maddy  dave  laura  dave

答案 1 :(得分:0)

如果你想计算每个单词的频率,你可以尝试这个:

>>> from collections import Counter
>>> words = "Dave, Laura, Maddy, Dave, Laura, Maddy, Dave, Laura, Dave"
>>> Counter([word.lower() for word in words.split(', ')])
Counter({'dave': 4, 'laura': 3, 'maddy': 2})

Counter

的文档

第一个答案的简短替代方案:

>>> words = "Dave, Laura, Maddy, Dave, Laura, Maddy, Dave, Laura, Dave"
>>> words.replace(',', ' ').lower()
'dave  laura  maddy  dave  laura  maddy  dave  laura  dave'

如果你想摆脱标点符号(超过','):

>>> import re
>>> words = "Dave! Laura: Maddy; Dave, Laura? Maddy, Dave, Laura, Dave."
>>> re.sub(r'[!:;,?.]', '', words).lower()
'dave laura maddy dave laura maddy dave laura dave'