我有一个文本文件,其中包含:
Number Name subject1 subject2 subject3 subject4 subject5
1234567 Jan 5 7 0 6 4
3526435 Marie 5 5 7 0 0
2230431 Kees 6 10 0 8 6
7685433 André 4 7 8 7 5
0364678 Antoinette 0 2 8 8 8
1424354 Jerôme 7 9 0 5 0
4536576 Kamal 8 0 8 7 8
1256033 Diana 0 0 0 0 0
5504657 Petra 6 6 7 0 6
9676575 Malika 0 6 0 0 8
0253756 Samira 3 8 6 7 10
我想计算每个学生的平均成绩(如果成绩= 0,则不包括在内),我想计算每个学科的平均成绩(同样,不计算0)。
在我自己的代码中,我复制了所有信息并将其放入列表中。
我面临的问题是,我需要我的Python程序才能读取文本文件并使用给定的数字进行计算。
到目前为止,这就是我所拥有的:
i = 0
file = open("resultaten.txt", "r")
for x in file:
if i == 0:
print("Lines: ")
else:
x = x.split()
print(i, x)
i +=1
人们将如何使用文本文件来计算一行中的特定字符?
谢谢。
答案 0 :(得分:1)
使用为处理表格数据(例如您的表格数据)而设计的库,这些类型的操作更容易实现。 Pandas是一个很好的例子,尽管入门可能有些艰巨,尤其是对于那些没有python经验的人。无论如何,这是使用熊猫来实现您想要的(我认为)的一种方法。排除零值会使它变得更加复杂,因此使用了密码:
# -*- coding: utf-8 -*-
# ^This line makes sure python is able to read some weird
# accented characters.
# Importing variaous libraries
import sys
import pandas as pd
import numpy as np
# Depending on your version of python, we need to import
# a different library for reading your input data as a
# string. This step is not required, you should probably
# use the pandas function called read_csv(), if you have
# your file stored locally.
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
input_data = StringIO("""Number Name subject1 subject2 subject3 subject4 subject5
1234567 Jan 5 7 0 6 4
3526435 Marie 5 5 7 0 0
2230431 Kees 6 10 0 8 6
7685433 André 4 7 8 7 5
0364678 Antoinette 0 2 8 8 8
1424354 Jerôme 7 9 0 5 0
4536576 Kamal 8 0 8 7 8
1256033 Diana 0 0 0 0 0
5504657 Petra 6 6 7 0 6
9676575 Malika 0 6 0 0 8
0253756 Samira 3 8 6 7 10
""")
# Read data, specify that columns are delimited by space,
# using the sep= argument.
df = pd.read_csv(input_data, sep=" ")
# Find all column names contain subject scores, based on their name
# We just pick all columns that starts with the string "subject".
subject_columns = [c for c in df.columns if c.startswith("subject")]
print subject_columns
# Calculate mean score for each subject by finding the sum of all scores
# for each subject, then divide it by the number of data points for each
# subject that does not equal (or is greater than) 0.
for subject in subject_columns:
df["%s_mean" % subject] = float(df[subject].sum()) / float(len(df[subject].loc[df[subject] > 0]))
# Calculate mean for each student, without 0s
# The .replace(0, np.NaN).count(axis=1) is just a trick to find the
# number of non-zero values in each row. In short, it replaces all
# values that are 0 with NaN, so that the count() function ignores
# those values when calculating the number of data points that are
# present in the dataset. I.e. it disregards values that are 0,
# so that they're excluded from the mean calculation.
df["student_mean"] = df[subject_columns].sum(axis=1) / df[subject_columns].replace(0, np.NaN).count(axis=1)
# This just configures pandas to print all columns in our dataset,
# and not truncate the print-out to fit to the screen.
pd.set_option("display.max_columns", 1000)
# Print out our final dataframe.
print df
最终数据集如下:
Number Name subject1 subject2 subject3 subject4 subject5 subject1_mean subject2_mean subject3_mean subject4_mean subject5_mean student_mean
0 1234567 Jan 5 7 0 6 4 5.5 6.666667 7.333333 6.857143 6.875 5.500000
1 3526435 Marie 5 5 7 0 0 5.5 6.666667 7.333333 6.857143 6.875 5.666667
2 2230431 Kees 6 10 0 8 6 5.5 6.666667 7.333333 6.857143 6.875 7.500000
3 7685433 André 4 7 8 7 5 5.5 6.666667 7.333333 6.857143 6.875 6.200000
4 364678 Antoinette 0 2 8 8 8 5.5 6.666667 7.333333 6.857143 6.875 6.500000
5 1424354 Jerôme 7 9 0 5 0 5.5 6.666667 7.333333 6.857143 6.875 7.000000
6 4536576 Kamal 8 0 8 7 8 5.5 6.666667 7.333333 6.857143 6.875 7.750000
7 1256033 Diana 0 0 0 0 0 5.5 6.666667 7.333333 6.857143 6.875 NaN
8 5504657 Petra 6 6 7 0 6 5.5 6.666667 7.333333 6.857143 6.875 6.250000
9 9676575 Malika 0 6 0 0 8 5.5 6.666667 7.333333 6.857143 6.875 7.000000
10 253756 Samira 3 8 6 7 10 5.5 6.666667 7.333333 6.857143 6.875 6.800000
请注意,您需要安装pandas模块才能正常工作。您还需要numpy模块。
答案 1 :(得分:1)
如果我们将其转换为词典,我们将对所要处理的信息有很大的灵活性。只需一点努力就可以完成。我们可以使用第一行来创建keys
,然后可以将这些键彼此压缩,然后通过压缩这些列表来创建元组列表。从那里我们可以使用字典构造函数来创建字典列表。现在,我们只需要从该词典列表中收集所有keys
列表中每个项目的subjects
,将它们映射到整数,并为学生得分全部{{1}时创建一个例外} s。如果不是,我们从完整列表中过滤出0
,然后计算平均值。接下来,要获取每个0
的平均值,我们可以提取与该主题相关的所有值,而不是提取subject
的值,我们映射0
,然后计算平均值。我为出现的文字提出了一些理由,而不是必要的。剩下的主题的处理过程是一样的,只是交换了主题。
ints
with open('text.txt') as f: content = [line.split() for line in f] keys = content[0] lst = list(zip([keys]*(len(content)-1), content[1:])) x = [zip(i[0], i[1]) for i in lst] z = [dict(i) for i in x] print('Average Grades'.center(30)) for i in z: subs =[i['subject1'], i['subject2'], i['subject3'], i['subject4'], i['subject5']] subs = list(map(int, subs)) if sum(subs) == 0: print('{:<10} average grade: {:>4}'.format(i['Name'], 0)) else: subs = list(filter(lambda x: x >0, subs)) avg = round(sum(subs)/len(subs), 2) print('{:<10} average grade: {:>4}'.format(i['Name'], avg)) sub1 = [i['subject1'] for i in z if i['subject1'] != '0'] sub1 = list(map(int, sub1)) sub1_avg = sum(sub1)/len(sub1) print('\nAverage Grade for Subject 1: {}'.format(sub1_avg))
答案 2 :(得分:0)
您可以为您的x.split()
函数建立索引,我将避免重写x
。
y = x.split()
Number = y[0]
Name = y[1]
...
或
Number, Name, subject1, subject2, subject3, subject4, subject5 = x.split()
然后,您可以计算平均值。 您可以尝试类似...
Number, Name, subject1, subject2, subject3, subject4, subject5 = x.split()
subjects = [float(subject1), float(subject2), float(subject3), float(subject4), float(subject5)]
sum = 0
zero_count = 0
for subject in subjects:
sum += subject
if subject is 0:
zero_count += 1
# this will print the mean
print(i, sum/(len(subjects)-zero_count)
此代码块可以替换您的else
语句中的内容,并且将打印索引和均值,排除等级为“ 0”。