我有一个CSV格式的学生成绩数据集,如下所示:
data = '''student,maths,science,english,nepali,computer
John,57,77,73,50,55
Mark,52,66,89,78,50
Ben,57,85,53,87,53
Toby,90,63,64,76,58
Anna,52,97,88,81,51'''
我希望它使用以下过程将其转换为嵌套的list
:
我想以list
的理解来做到这一点。
所需的输出:
[[57, 77, 73, 50, 55],
[52, 66, 89, 78, 50],
[57, 85, 53, 87, 53],
[90, 63, 64, 76, 58],
[52, 97, 88, 81, 51]]
我尝试了以下代码(有效,但未使用列表理解):
def read_data(file_name):
'''function to read data from a file, process it and store
it in a data matrix (2D list)
returns the data matrix'''
file = open(file_name,"r")
data = file.readlines()
file.close()
mat = []
for line in data:
mat.append(line.replace("\n","").split(","))
for i in range(1,len(mat)):
for j in range(1,len(mat[i])):
mat[i][j] = int(mat[i][j])
return mat
答案 0 :(得分:3)
通常,当您具有以下形式的for循环时:
result = []
for object in iterable:
result.append(function(object))
您可以将其重新构建为list
理解,如下所示:
result = [function(object) for object in iterable]
因此,在这里我们可以有以下内容:
mat = [line.replace('\n', '').split(',') for line in data]
但是,请注意,我们对每个split
的值都有一个line
调用,这意味着我们有一个嵌套的list
,并且我们想要转换的每个元素内部 list
转换为整数。对于嵌套的list
,我们需要嵌套的list
理解。回到上面的模式,很明显该函数为int
:
mat = [[int(element) for element in line.replace('\n', '').split(',)] for line in data]
不幸的是,该 still 无效,因为 data
的第一行是标题,每行的第一元素是学生的姓名。因此,我们需要将数据分为标题,名称和标记:
with open(file_name) as f:
data = f.read().split('\n')
processed_data = [line.replace('\n', '').split(',') for line in data]
headings = processed_data[0]
names = [line[0] for line in processed_data[1:]]
marks = [[int(element) for element in line[1:]] for line in processed_data[1:]]
答案 1 :(得分:3)
我们在这里
data = """student,maths,science,english,nepali,computer
John,57,77,73,50,55
Mark,52,66,89,78,50
Ben,57,85,53,87,53
Toby,90,63,64,76,58
Anna,52,97,88,81,51"""
output = [[int(item) for item in line]
for row in data.split("\n")[1:]
for line in [row.split(",")[1:]]]
print(output)
哪个产量
[
[57, 77, 73, 50, 55],
[52, 66, 89, 78, 50],
[57, 85, 53, 87, 53],
[90, 63, 64, 76, 58],
[52, 97, 88, 81, 51]
]
这使用列表切片([1:]
),并且变量名称不言自明。
答案 2 :(得分:2)
您可以在此处使用csv
模块。
例如:
import csv
def read_data(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
next(reader) #Skip header
result = [list(map(int,row[1:])) for row in reader] #list comprehension
return result
答案 3 :(得分:1)
尝试一下,而不是第二个循环:
mat = [list(map(int,i)) for i in mat]
答案 4 :(得分:1)
with open("your_file.txt") as f:
c = f.readlines()
o = [x.replace("\n","").split(",")[1:] for x in c[1:]]
这条线是您感兴趣的,
[x.replace("\n","").split(",")[1:] for x in c[1:]]
对于文件中的每一行x,请替换“ \ n”,然后将它们用“,”分成列表。 拆分后,您可以忽略第一个索引[0],它将是您要删除的名称。