我是Python的新手,正在尝试读取CSV文件:`
1980,Mark,Male,Student,L,90,56,78,44,88
1982,Cindy,Female,Student,S,45,76,22,42,90
1984,Kevin,Male,Student,L,67,83,52,55,59
1986,Michael,Male,Student,M,94,63,73,60,43
1988,Anna,Female,Student,S,66,50,59,57,33
1990,Jessica,Female,Student,S,72,34,29,69,27
1992,John,Male,Student,L,80,67,90,89,68
1994,Tom,Male,Student,M,23,60,89,78,39
1996,Nick,Male,Student,S,56,98,84,44,50
1998,Oscar,Male,Student,M,64,61,74,59,63
2000,Andy,Male,Student,M,11,50,93,69,90
我只想将此数据的特定属性保存到字典或列表列表中。例如,我只想保留年份,名称和五个数字(连续)。我不确定如何仅排除中间三列。
这是我现在拥有的代码:
def read_data(filename):
f = open("myfile.csv", "rt")
import csv
data = {}
for line in f:
row = line.rstrip().split(',')
data[row[0]] = [e for e in row[5:]]
return data
我只知道如何将大块的列保持在一起,而不仅是将特定的列一一对应。
答案 0 :(得分:3)
您可以使用pd.read_csv()
并输入所需的列名:
import pandas as pd
df = pd.read_csv('csv1.csv', names=['Year','Name','Gender','ID1','ID2','Val1','Val2','Val3','Val4','Val5'])
desired = df[['Year','Name','Val1','Val2','Val3','Val4','Val5']]
收益:
Year Name Val1 Val2 Val3 Val4 Val5
0 1980 Mark 90 56 78 44 88
1 1982 Cindy 45 76 22 42 90
2 1984 Kevin 67 83 52 55 59
3 1986 Michael 94 63 73 60 43
4 1988 Anna 66 50 59 57 33
5 1990 Jessica 72 34 29 69 27
6 1992 John 80 67 90 89 68
7 1994 Tom 23 60 89 78 39
8 1996 Nick 56 98 84 44 50
9 1998 Oscar 64 61 74 59 63
10 2000 Andy 11 50 93 69 90
另一种选择是使用usecols
预先传递列索引位置,如下所示:
df = pd.read_csv('csv1.csv', header=None, usecols=[0,1,5,6,7,8,9])
请注意,这将返回索引位置为columns的数据框:
0 1 5 6 7 8 9
0 1980 Mark 90 56 78 44 88
1 1982 Cindy 45 76 22 42 90
2 1984 Kevin 67 83 52 55 59
3 1986 Michael 94 63 73 60 43
4 1988 Anna 66 50 59 57 33
5 1990 Jessica 72 34 29 69 27
6 1992 John 80 67 90 89 68
7 1994 Tom 23 60 89 78 39
8 1996 Nick 56 98 84 44 50
9 1998 Oscar 64 61 74 59 63
10 2000 Andy 11 50 93 69 90
答案 1 :(得分:0)
您可以尝试拆分每行并将其明确分配给变量;然后只需忽略不使用的变量(我将其命名为_
,因此很明显它们将不会被使用)。
如果一行中的行少于或多于所需字段,这将引发错误(在具有split()
的代码行中)。
def read_data(filename):
data = {}
with open(filename) as f:
for line in f:
line = line.strip()
if len(line) > 0:
year, name, _, _, _, n1, n2, n3, n4, n5 = line.split(',')
data[year] = [n1, n2, n3, n4, n5]
return data
答案 2 :(得分:0)
您可以通过简单的列表理解来做到这一点:
def read_data(filename):
f = open("myfile.csv", "rt")
data = {}
col_nums = [0, 1, 5, 6, 7, 8, 9]
for line in f:
row = line.rstrip().split(',')
data[row[0]] = [row[i] for i in col_nums]
return data
您还可以考虑使用Pandas来帮助您读取和处理数据:
import pandas as pd
df = pd.read_csv("myfile.csv", columns=['year', 'name', 'gender', 'kind', 'size', 'num1', 'num2', 'num3', 'num4', 'num5'])
data = df[['year', 'name', 'num1', 'num2', 'num3', 'num4', 'num5']]