Question

我有一个文本文件，其中包含：

Number Name subject1 subject2 subject3 subject4 subject5
1234567 Jan 5 7 0 6 4
3526435 Marie 5 5 7 0 0
2230431 Kees 6 10 0 8 6
7685433 André 4 7 8 7 5
0364678 Antoinette 0 2 8 8 8
1424354 Jerôme 7 9 0 5 0
4536576 Kamal 8 0 8 7 8
1256033 Diana 0 0 0 0 0
5504657 Petra 6 6 7 0 6
9676575 Malika 0 6 0 0 8
0253756 Samira 3 8 6 7 10

我想计算每个学生的平均成绩（如果成绩= 0，则不包括在内），我想计算每个学科的平均成绩（同样，不计算0）。

在我自己的代码中，我复制了所有信息并将其放入列表中。

我面临的问题是，我需要我的Python程序才能读取文本文件并使用给定的数字进行计算。

到目前为止，这就是我所拥有的：

i = 0
file = open("resultaten.txt", "r")

for x in file:
    if i == 0:
        print("Lines: ")

    else:
        x = x.split()
        print(i, x)
    i +=1

人们将如何使用文本文件来计算一行中的特定字符？

谢谢。

Answer 1

使用为处理表格数据（例如您的表格数据）而设计的库，这些类型的操作更容易实现。 Pandas是一个很好的例子，尽管入门可能有些艰巨，尤其是对于那些没有python经验的人。无论如何，这是使用熊猫来实现您想要的（我认为）的一种方法。排除零值会使它变得更加复杂，因此使用了密码：

# -*- coding: utf-8 -*-
# ^This line makes sure python is able to read some weird
# accented characters.

# Importing variaous libraries
import sys
import pandas as pd
import numpy as np

# Depending on your version of python, we need to import
# a different library for reading your input data as a
# string. This step is not required, you should probably
# use the pandas function called read_csv(), if you have
# your file stored locally.
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO

input_data = StringIO("""Number Name subject1 subject2 subject3 subject4 subject5
1234567 Jan 5 7 0 6 4
3526435 Marie 5 5 7 0 0
2230431 Kees 6 10 0 8 6
7685433 André 4 7 8 7 5
0364678 Antoinette 0 2 8 8 8
1424354 Jerôme 7 9 0 5 0
4536576 Kamal 8 0 8 7 8
1256033 Diana 0 0 0 0 0
5504657 Petra 6 6 7 0 6
9676575 Malika 0 6 0 0 8
0253756 Samira 3 8 6 7 10
""")

# Read data, specify that columns are delimited by space,
# using the sep= argument.
df = pd.read_csv(input_data, sep=" ")

# Find all column names contain subject scores, based on their name
# We just pick all columns that starts with the string "subject".
subject_columns = [c for c in df.columns if c.startswith("subject")]
print subject_columns

# Calculate mean score for each subject by finding the sum of all scores
# for each subject, then divide it by the number of data points for each
# subject that does not equal (or is greater than) 0.
for subject in subject_columns:
    df["%s_mean" % subject] = float(df[subject].sum()) / float(len(df[subject].loc[df[subject] > 0]))

# Calculate mean for each student, without 0s
# The .replace(0, np.NaN).count(axis=1) is just a trick to find the
# number of non-zero values in each row. In short, it replaces all
# values that are 0 with NaN, so that the count() function ignores
# those values when calculating the number of data points that are
# present in the dataset. I.e. it disregards values that are 0,
# so that they're excluded from the mean calculation.
df["student_mean"] = df[subject_columns].sum(axis=1) / df[subject_columns].replace(0, np.NaN).count(axis=1)

# This just configures pandas to print all columns in our dataset,
# and not truncate the print-out to fit to the screen.
pd.set_option("display.max_columns", 1000)

# Print out our final dataframe.
print df

最终数据集如下：

     Number        Name  subject1  subject2  subject3  subject4  subject5  subject1_mean  subject2_mean  subject3_mean  subject4_mean  subject5_mean  student_mean
0   1234567         Jan         5         7         0         6         4            5.5       6.666667       7.333333       6.857143          6.875      5.500000
1   3526435       Marie         5         5         7         0         0            5.5       6.666667       7.333333       6.857143          6.875      5.666667
2   2230431        Kees         6        10         0         8         6            5.5       6.666667       7.333333       6.857143          6.875      7.500000
3   7685433       André         4         7         8         7         5            5.5       6.666667       7.333333       6.857143          6.875      6.200000
4    364678  Antoinette         0         2         8         8         8            5.5       6.666667       7.333333       6.857143          6.875      6.500000
5   1424354      Jerôme         7         9         0         5         0            5.5       6.666667       7.333333       6.857143          6.875      7.000000
6   4536576       Kamal         8         0         8         7         8            5.5       6.666667       7.333333       6.857143          6.875      7.750000
7   1256033       Diana         0         0         0         0         0            5.5       6.666667       7.333333       6.857143          6.875           NaN
8   5504657       Petra         6         6         7         0         6            5.5       6.666667       7.333333       6.857143          6.875      6.250000
9   9676575      Malika         0         6         0         0         8            5.5       6.666667       7.333333       6.857143          6.875      7.000000
10   253756      Samira         3         8         6         7        10            5.5       6.666667       7.333333       6.857143          6.875      6.800000

请注意，您需要安装pandas模块才能正常工作。您还需要numpy模块。

Answer 2

如果我们将其转换为词典，我们将对所要处理的信息有很大的灵活性。只需一点努力就可以完成。我们可以使用第一行来创建keys，然后可以将这些键彼此压缩，然后通过压缩这些列表来创建元组列表。从那里我们可以使用字典构造函数来创建字典列表。现在，我们只需要从该词典列表中收集所有keys列表中每个项目的subjects，将它们映射到整数，并为学生得分全部{{1}时创建一个例外} s。如果不是，我们从完整列表中过滤出0，然后计算平均值。接下来，要获取每个0的平均值，我们可以提取与该主题相关的所有值，而不是提取subject的值，我们映射0，然后计算平均值。我为出现的文字提出了一些理由，而不是必要的。剩下的主题的处理过程是一样的，只是交换了主题。

ints

with open('text.txt') as f:
    content = [line.split() for line in f]

keys = content[0]

lst = list(zip([keys]*(len(content)-1), content[1:]))
x = [zip(i[0], i[1]) for i in lst]
z = [dict(i) for i in x]

print('Average Grades'.center(30))
for i in z:
    subs =[i['subject1'], i['subject2'], i['subject3'], i['subject4'], i['subject5']]
    subs = list(map(int, subs))
    if sum(subs) == 0:
        print('{:<10} average grade: {:>4}'.format(i['Name'], 0))
    else:
        subs = list(filter(lambda x: x >0, subs))
        avg = round(sum(subs)/len(subs), 2)
        print('{:<10} average grade: {:>4}'.format(i['Name'], avg))

sub1 = [i['subject1'] for i in z if i['subject1'] != '0']
sub1 = list(map(int, sub1))
sub1_avg = sum(sub1)/len(sub1)
print('\nAverage Grade for Subject 1: {}'.format(sub1_avg))

Answer 3

您可以为您的x.split()函数建立索引，我将避免重写x。

y = x.split() Number = y[0] Name = y[1] ...

或

Number, Name, subject1, subject2, subject3, subject4, subject5 = x.split()

然后，您可以计算平均值。您可以尝试类似...

    Number, Name, subject1, subject2, subject3, subject4, subject5 = x.split()
    subjects = [float(subject1), float(subject2), float(subject3), float(subject4), float(subject5)]
    sum = 0
    zero_count = 0
    for subject in subjects:
       sum += subject
       if subject is 0:
          zero_count += 1
    # this will print the mean
    print(i,  sum/(len(subjects)-zero_count)

此代码块可以替换您的else语句中的内容，并且将打印索引和均值，排除等级为“ 0”。

如何使用Python从文本文件加载计算输入？

3 个答案: