应用错误收集

将来自for循环的三个获取的值放入DataFrame / Python中

时间：2018-08-16 00:04:09

标签： python pandas dataframe

这个想法很简单，打开一个文本文件，阅读，发现字母重复了多少次，然后估计每个字母在文本中所占的百分比。

fileName = input("Enter file Name: ")
with open(fileName) as f:
    text = f.read()
print(text)

def count_char(text, char):
    count=0
    for c in text:
        if c == char:
            count+=1
    return count

for char in "ABCDEFGHIJKLMNOPQRSTUVWXYZ" and "abcdefghijklmnopqrstuvwxyz":
    percentage = 100 * count_char(text, char) / len(text)
    print("\n letter: {0} is taking {1}% of the text and that is {2}".format(char, round(percentage,2), count_char(text, char)))

我确实做到了这一点，但是我无法将这些值放入数据框中以使其看起来更加用户友好。

此外，如果用户输入了不存在的文件（“错误404文件未找到！”）消息，我还要发出if-else语句。

1 个答案:

答案 0 :(得分：1)

一旦您读入文件，使得内容为名称为text的字符串。您可以将其传递给pandas.value_counts函数，以提供每个字符的计数。为了仅过滤出字母，我使用filter函数并将str.isalpha作为谓词来确定每个字符的真实性。

`pandas.value_counts`

您的代码应如下所示：

import pandas as pd
import os

fileName = input("Enter file Name: ")
if os.path.exists(fileName):
    with open(fileName) as f:
        text = f.read()
else:
    print(("Error 404 File Not Found!")

counts = pd.value_counts([*filter(str.isalpha, text)]) / len(text)
print(counts)

a    0.064935
f    0.058442
q    0.051948
d    0.045455
r    0.045455
j    0.038961
e    0.032468
l    0.032468
k    0.032468
s    0.032468
t    0.032468
u    0.032468
i    0.032468
p    0.025974
w    0.025974
h    0.025974
g    0.019481
o    0.019481
y    0.012987
n    0.012987
T    0.006494
Q    0.006494
R    0.006494
dtype: float64

设置

text = """
a;sdlkfja;sldkfja
spogkia
;dlkfq
;welrfuq[3094t8urq34TRQaaj]
aksdfjpaoi43urpq9384t983456tuyweirghnwehrg
q34haed89fy9q9384uithnjlfasdf;q3p[er]q34t9rwiofdj"""