我有一个字符串列表和一个文本文件。该列表由包含1个标记的字符串和包含1个以上的其他字符串组成,因为它们是专有名词。看起来像:['ana e joao','fab g。 ruggeri”,“ resende”,“ Ana e Joao”)。
使用我的代码,我在文本文件中找到了字符串,但是我不知道如何计算找到的每个字符串的出现频率。
我尝试将字符串存储在以','分隔的文件中,读取文件并以','分隔。我还读取了文件,并尝试一次匹配文本文件中的所有字符串,但我需要计算每个字符串在文本文件中出现的时间。参见下面的代码:
def proper_nouns():
with open('/Users/proper_nouns.txt','r', encoding="utf-8") as
p:
pn = p.read()
s=pn.split(',')
while True:
try:
f =Path(input("\Enter your file path : "))
with open(f,'r', encoding="utf-8") as fi:
wds = fi.read()
break
except FileNotFoundError:
print("\nTry again")
propn_found=[y for y in s if x in wds]
print(propn_found)
def proper_nouns()
我想一次在一个文本文件中找到所有这些字符串,计算该文本文件中每个字符串的出现频率,然后像这样打印它们:
“安娜” 2 “ Ana e Joao” 3 “代表” 4
以此类推...
答案 0 :(得分:0)
如果您想在没有任何库的情况下手动进行操作并遍历文本,则应该可以:
def proper_nouns():
wordsfilepath = '/Users/proper_nouns.txt'
textfilepath = '/Users/text.txt'
with open(wordsfilepath,'r', encoding="utf-8") as p:
words = p.read()
words = words.split(',')
with open(textfilepath, 'r', encoding="utf-8") as textfile:
searchtext = textfile.read()
searchtext = searchtext.strip()
wordDict = {}
for word in searchtext:
try:
wordDict[word] = wordDict[word]+1
except IndexError:
wordDict[word] = 1
for word in wordDict:
print(f"{word}, wordDict[word]
答案 1 :(得分:0)
您可以应用dict理解来生成具有所需数据的字典:
text = "some random text apple, some text ginger, some other blob data"
words = "some", "text", "blob"
result = {word: text.count(word) for word in words}
输出:
{'some': 3, 'text': 2, 'blob': 1}
更新。
要解决识别单词的问题,我建议使用正则表达式:
import re
...
result = {word: re.subn(r"\b{}\b".format(word), "", text)[1] for word in words}