如何使用python中的for循环从字符串中打印每个唯一单词的频率

时间:2018-10-10 06:09:54

标签: python string python-3.x loops

该段旨在包含空格和随机标点符号,我通过执行.replace在for循环中将其删除。然后,我通过.split()将段落放入列表中,以获得['the','title','etc']。然后,我使两个函数对单词进行计数以对每个单词进行计数,但是我不想让它对每个单词进行计数,因此我使另一个函数创建了一个唯一列表。但是,我需要创建一个for循环以打印出每个单词以及输出了类似这样的内容

The word The appears 2 times in the paragraph.
The word titled appears 1 times in the paragraph.
The word track appears 1 times in the paragraph.

我也很难理解for循环的本质功能。我读到,我们应该只使用for循环进行计数,而while循环进行任何其他操作,而while循环也可以用于计数。

    paragraph = """  The titled track “Heart Attack” does not interpret the 
    feelings of being in love in a serious way, 
    but with Chuu’s own adorable emoticon like ways. The music video has 
    references to historical and fictional 
    figures such as the artist Rene Magritte!!....  """


for r in ((",", ""), ("!", ""), (".", ""), ("  ", "")):
    paragraph = paragraph.replace(*r)

paragraph_list = paragraph.split()


def count_words(word, word_list):

    word_count = 0
    for i in range(len(word_list)):
        if word_list[i] == word:
            word_count += 1
    return word_count

def unique(word):
    result = []
    for f in word:
        if f not in result:
            result.append(f)
    return result
unique_list = unique(paragraph_list)

3 个答案:

答案 0 :(得分:3)

最好将reget使用默认值:

paragraph = """  The titled track “Heart Attack” does not interpret the
feelings of being in love in a serious way,
but with Chuu’s own adorable emoticon like ways. The music video has
references to historical and fictional
figures such as the artist Rene Magritte!!....  c c c c c c c ccc"""

import re

word_count = {}
for w in re.split(' |,|“|”|!|\?|\.|\n', paragraph.lower()):
    word_count[w] = word_count.get(w, 0) + 1
del word_count['']

for k, v in word_count.items():
    print("The word {} appears {} time(s) in the paragraph".format(k, v))

输出:

The word the appears 4 time(s) in the paragraph
The word titled appears 1 time(s) in the paragraph
The word track appears 1 time(s) in the paragraph
...

如何讨论Chuu’s是可以讨论的,我决定不拆分,但以后可以根据需要添加。

更新

以下行使用正则表达式拆分paragraph.lower()。好处是您可以描述多个分隔符

re.split(' |,|“|”|!|\?|\.|\n', paragraph.lower()

关于此行:

word_count[w] = word_count.get(w, 0) + 1

word_count是字典。使用get的好处是,如果w不在词典中,则可以定义默认值。该行基本上更新了单词w

的计数

答案 1 :(得分:0)

当心,示例文本很简单,但标点规则可能很复杂,或者未正确遵守。文本包含2个相邻空格是什么(是的,它不正确但很频繁)?如果作家更习惯法语,并在冒号或分号之前之前后面写空格怎么办?

我认为's构造需要特殊处理。怎么办:"""John has a bicycle. Mary says that her one is nicer that John's."""恕我直言,John一词在这里出现两次,而您的算法将看到1 John和1 Johns

此外,由于Unicode文本现在在WEB页面上很常见,因此您应该准备好查找与空格和标点符号等价的代码:

“ U+201C LEFT DOUBLE QUOTATION MARK
” U+201D RIGHT DOUBLE QUOTATION MARK
’ U+2019 RIGHT SINGLE QUOTATION MARK
‘ U+2018 LEFT SINGLE QUOTATION MARK
  U+00A0 NO-BREAK SPACE

此外,根据此older question,删除标点符号的最佳方法是translate。链接的问题使用Python 2语法,但是在Python 3中,您可以执行以下操作:

paragraph = paragraph.strip()                   # remove initial and terminal white spaces
paragraph = paragraph.translate(str.maketrans('“”’‘\xa0', '""\'\' '))  # fix high code punctuations
paragraph = re.replace("\w's\s", "", paragraph)  # remove 's
paragraph = paragraph.translate(str.maketrans(None, None, string.punctuation) # remove punctuations
words = paragraph.split()

答案 2 :(得分:-1)

请尝试以下操作:

paragraph = """  The titled track “Heart Attack” does not interpret the 
feelings of being in love in a serious way, 
but with Chuu’s own adorable emoticon like ways. The music video has 
references to historical and fictional 
figures such as the artist Rene Magritte!!....  c c c c c c c ccc"""

characterToRemove = (",","!",".","?",'“','”')
for i in paragraph:
    if i in characterToRemove:
         paragraph = paragraph.replace(i,"")

paragraph=paragraph.split()
uniqueWords=set(paragraph)
dictionartWords={}
for i in uniqueWords:
    dictionartWords[i]=0

for i in paragraph:
    if i in dictionartWords.keys():
        dictionartWords[i]+=1

因此,您获得的词典将包含唯一词作为关键字和数字值,以指示段落中每个唯一词的数量:

 print(dictionartWords)

{'The':2,'like':1,'serious':1,'titled':1,'Rene':1,'a':1,'artist':1,'video': 1,'c':7,'with':1,'track':1,'to':1,'fictional':1,'feelings':1,'ccc':1,'but':1, 'not':1,'has':1,'解释':1,'way':1,'as':1,'of':1,'emoticon':1,'Heart':1,'in ':2,'adorable':1,'love':1,'references':1,'being':1,'Magritte':1,1,'Chuu's :: 1,'historical':1,'such': 1,'和':1,'做':1,'音乐':1,'the':2,'人物':1,'攻击':1,'拥有':1,'方式':1}