这个想法很简单,打开一个文本文件,阅读,发现字母重复了多少次,然后估计每个字母在文本中所占的百分比。
fileName = input("Enter file Name: ")
with open(fileName) as f:
text = f.read()
print(text)
def count_char(text, char):
count=0
for c in text:
if c == char:
count+=1
return count
for char in "ABCDEFGHIJKLMNOPQRSTUVWXYZ" and "abcdefghijklmnopqrstuvwxyz":
percentage = 100 * count_char(text, char) / len(text)
print("\n letter: {0} is taking {1}% of the text and that is {2}".format(char, round(percentage,2), count_char(text, char)))
我确实做到了这一点,但是我无法将这些值放入数据框中以使其看起来更加用户友好。
此外,如果用户输入了不存在的文件(“错误404文件未找到!”)消息,我还要发出if-else语句。
答案 0 :(得分:1)
一旦您读入文件,使得内容为名称为text
的字符串。您可以将其传递给pandas.value_counts
函数,以提供每个字符的计数。为了仅过滤出字母,我使用filter
函数并将str.isalpha
作为谓词来确定每个字符的真实性。
pandas.value_counts
您的代码应如下所示:
import pandas as pd
import os
fileName = input("Enter file Name: ")
if os.path.exists(fileName):
with open(fileName) as f:
text = f.read()
else:
print(("Error 404 File Not Found!")
counts = pd.value_counts([*filter(str.isalpha, text)]) / len(text)
print(counts)
a 0.064935
f 0.058442
q 0.051948
d 0.045455
r 0.045455
j 0.038961
e 0.032468
l 0.032468
k 0.032468
s 0.032468
t 0.032468
u 0.032468
i 0.032468
p 0.025974
w 0.025974
h 0.025974
g 0.019481
o 0.019481
y 0.012987
n 0.012987
T 0.006494
Q 0.006494
R 0.006494
dtype: float64
text = """
a;sdlkfja;sldkfja
spogkia
;dlkfq
;welrfuq[3094t8urq34TRQaaj]
aksdfjpaoi43urpq9384t983456tuyweirghnwehrg
q34haed89fy9q9384uithnjlfasdf;q3p[er]q34t9rwiofdj"""