Question

我有一个字符串列表和一个文本文件。该列表由包含1个标记的字符串和包含1个以上的其他字符串组成，因为它们是专有名词。看起来像：['ana e joao'，'fab g。 ruggeri”，“ resende”，“ Ana e Joao”）。

使用我的代码，我在文本文件中找到了字符串，但是我不知道如何计算找到的每个字符串的出现频率。

我尝试将字符串存储在以'，'分隔的文件中，读取文件并以'，'分隔。我还读取了文件，并尝试一次匹配文本文件中的所有字符串，但我需要计算每个字符串在文本文件中出现的时间。参见下面的代码：

def proper_nouns():
    with open('/Users/proper_nouns.txt','r', encoding="utf-8") as 
p:
        pn = p.read()
        s=pn.split(',')


    while True:
        try:
            f =Path(input("\Enter your file path : "))
            with open(f,'r', encoding="utf-8") as fi:
                wds = fi.read()
                break
        except FileNotFoundError:
            print("\nTry again")

    propn_found=[y for y in s if x in wds]



    print(propn_found)

def proper_nouns()

我想一次在一个文本文件中找到所有这些字符串，计算该文本文件中每个字符串的出现频率，然后像这样打印它们：

“安娜” 2 “ Ana e Joao” 3 “代表” 4

以此类推...

Answer 1

如果您想在没有任何库的情况下手动进行操作并遍历文本，则应该可以：

def proper_nouns():

    wordsfilepath = '/Users/proper_nouns.txt'
    textfilepath = '/Users/text.txt'
    with open(wordsfilepath,'r', encoding="utf-8") as p:
        words = p.read()
        words = words.split(',')
    with open(textfilepath, 'r', encoding="utf-8") as textfile:
        searchtext = textfile.read()
        searchtext = searchtext.strip()
    wordDict = {}
    for word in searchtext:
        try:
              wordDict[word] = wordDict[word]+1
        except IndexError:
              wordDict[word] = 1
    for word in wordDict:
        print(f"{word}, wordDict[word]

Answer 2

您可以应用dict理解来生成具有所需数据的字典：

text = "some random text apple, some text ginger, some other blob data"
words = "some", "text", "blob"
result = {word: text.count(word) for word in words}

输出：

{'some': 3, 'text': 2, 'blob': 1}

更新。

要解决识别单词的问题，我建议使用正则表达式：

import re
...
result = {word: re.subn(r"\b{}\b".format(word), "", text)[1] for word in words}

如何在文本中查找多个字符串并计算找到的字符串数？

2 个答案: