Question

我对python中的文件相当新，并希望在文件中找到包含8个字母的单词，这些单词会打印出来，并保留实际存在的数字总数。你能看一下文件，比如它是一个非常大的字符串，还是有特定的方式来完成它？

Answer 1

您可以使用Python的Counter来执行此操作：

from collections import Counter
import re

with open('input.txt') as f_input:
    text = f_input.read().lower()
    words = re.findall(r'\b(\w+)\b', text)
    word_counts = Counter(w for w in words if len(w) == 8)

    for word, count in word_counts.items():
        print(word, count)

其工作原理如下：

它读入一个名为input.txt的文件，作为一个非常长的字符串。
然后将其全部转换为小写，以确保将具有不同大小写的相同单词计为同一个单词。
它使用正则表达式将所有文本拆分为单词列表。
它使用列表推导将任何长度为8个字符的单词存储到Counter中。
它会显示所有匹配的条目以及计数。

Answer 2

2年前有一个有用的帖子称为＆＃34;如何将文本文件拆分为python中的单词？＆＃34;

How to split a text file to its words in python?

它描述了按空格分割线条。如果你有标点符号，如逗号和句号，那么你必须更复杂一些。这里有帮助：＆＃34; Python - 带有多个分隔符的分裂字符串＆＃34; Split Strings with Multiple Delimiters?

您可以使用函数len（）来获取每个单词的长度。

Answer 3

尝试此代码，其中“eight_l_words”是所有八个字母单词的数组，其中“number_of_8lwords”是八个字母单词的数字：

 # defines text to be used
 your_file = open("file_location","r+")
 text = your_file.read

 # divides the text into lines and defines some arrays
 lines = text.split("\n")
 words = []
 eight_l_words = []

 # iterating through "lines" adding each separate word to the "words" array
 for each in lines:
     words += each.split(" ")

 # checking to see if each word in the "words" array is 8 chars long, and if so
 # appending that words to the "eight_l_word" array
 for each in words:
     if len(each) == 8:
         eight_l_word.append(each)

 # finding the number of eight letter words
 number_of_8lwords = len(eight_l_words)

 # displaying results
 print(eight_l_words)
 print("There are "+str(number_of_8lwords)+" eight letter words")

使用

运行代码

 text = "boomhead shot\nshamwow slapchop"

产生结果：

 ['boomhead', 'slapchop']
 There are 2 eight letter words

如何在Python中查找文件中的单词长度

3 个答案: